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ABSTRACT 


As  applications  of  digital  systems  continue  to  expand,  the  need 
arises  for  better  methods  of  analysis  of  functions  of  discrete  variables. 
Particularly  important  is  the  ability  to  gauge  accurately  the  difficulty 
of  a  problem;  this  leads  to  measuring  a  function's  complexity.  This  in 
turn  requires  an  implementation-independent  model  of  function  evaluation, 
one  that  also  shows  the  contribution  of  individual  variables  to  the 
function's  complexity. 

One  such  model,  called  a  decision  tree,  is  Introduced;  it  is 
essentially  a  sequential  evaluation  procedure  where,  at  each  step,  a 
variable's  value  is  determined  and  the  next  action  chosen  accordingly. 
Decision  trees  have  been  used  in  switching  circuits,  data  bases,  pattern 
recognition,  machine  diagnosis,  and  remote  data  processing.  The  activity 
of  a  variable,  a  new  concept  that  measures  the  contribution  of  a  variable 
to  the  complexity  of  a  function,  is  defined  and  its  relation  to  decision 
trees  is  described.  Based  upon  these  results  (which  can  be  generalized 
to  recursive  functions  and  hierarchies  of  relations) ,  a  complexity 
measure  is  proposed.  The  use  of  that  measure  and  of  the  concept  of 
activity  in  testing  large  systems  (where  a  number  variables  may  be 
inaccessible)  is  then  examined,  with  particular  ^  on  continuous 

checking  of  systems  in  operation.  v  Y 
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CHAPTER  1 


INTRODUCTION 

As  applications  of  digital  systems  continue  to  expand,  the  need 
arises  for  better  methods  of  analysis  of  functions  of  discrete  (in 
particular,  binary)  variables.  Such  functions  often  represent  the  total 
knowledge  available  to  the  designer  or  engineer  about  a  system,  so  that 
their  understanding  is  critical  to  the  development  of  design  methods, 
cost  estimates,  troubleshooting  procedures,  etc. 

Of  particular  importance  in  this  regard  is  the  ability  to  gauge 
accurately  the  difficulty  of  the  problem.  This  enables  the  system 
designer  to  evaluate  the  feasibility  of  the  project,  select  tools  or 
methods  of  appropriate  capabilities,  and  evolve  cost,  time,  and  other 
estimates.  Thus,  a  crucial  part  of  the  analysis  is  measuring  complexity, 
more  precisely,  a  function's  complexity. 

Such  a  measure  can  be  defined  experimentally,  thereby  relating  it 
directly  to  human  experience,  as  is  the  case  in  software  science 
(Halstead  77).  The  development  can  also  be  analytical,  based  upon  some 
model  of  function  evaluation;  such  measures  often  quantify  the  expense 
of  some  resource  present  in  the  model,  such  as  time  and  memory  in 
concrete  complexity  theory  (Aho  74,  Garey  79),  or  logic  gates  in 
combinational  complexity  theory  (Savage  76,  Pippenger  77) .  These  three 
approaches,  however,  assume  further  knowledge  about  the  system  because 
they  are  not  implementation-independent.  Moreover,  they  do  not 
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explicitly  show  the  contribution  of  individual  input  variables  to  the 
complexity  of  the  function. 

An  Inplementation-lndependent  model  of  function  evaluation  is  also 
needed  in  computational  complexity  theory  in  order  to  establish  lower 
bounds  on  the  amount  of  work  required  for  evaluation.  One  such  model, 
which  also  allows  an  analysis  of  the  contribution  of  Individual  variables, 
is  called  a  decision  tree;  it  has  been  used  to  prove  lower  bounds  on 
sorting  (Knuth  71),  set  manipulation  (Reingold  72),  and  recognition  of 
graph  properties  (Rivest  76b) .  A  decision  tree  is  essentially  a 
sequential  evaluation  procedure  whereby  the  value  of  a  variable  is 
determined  and  the  next  action  (choice  of  another  variable  to  evaluate 
or  decision  as  to  the  value  of  the  function)  chosen  accordingly. 

Figure  1.1  shows  a  decision  tree  for  sorting  three  elements.  The 

3 

variables  are  all  (2)  ■  3  possible  comparisons  between  two  elements  and 
are  binary  in  that  the  result  of  comparing  (a:b)  is  either  (a  ^  b)  or 
(a  <  b) ,  Thus,  each  internal  node  of  the  tree  has  two  children, 
corresponding  to  the  two  possible  values  of  the  variable  evaluated  at 
that  node.  The  external  nodes  are  values  of  the  function,  in  this  case 
permutations. 

Since  decision  trees  are  models  of  sequential  evaluation,  they 
have  been  used  extensively  wherever  parallel  or  tabular  data  must  be 
converted  to  sequential  procedures,  as  in  decision  tables  (Metzner  77), 
switching  theory  (Lee  59,  Cerny  79),  machine  diagnosis  (Chang  70),  and 
data  base  queries  (Ullman  80),  or  when  inputs  are  provided  one  at  a  time, 
as  in  taxonomy  (Jardine  71,  Carey  72),  multistage  pattern  recognition 
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(Sethi  77),  and  remote  data  processing  (You  76).  As  a  consequence, 
decision  trees  have  come  to  be  known  under  many  names  and  disconnected 
results  about  them  appear  under  many  guises  in  the  technical  literature. 
This  work  generalizes  and  unifies  the  concept  of  a  decision  tree, 
presenting  the  first  formal  definition  of  it.  Known  results  are 
reviewed  and  the  various  measures  used  to  characterize  decision  trees 
are  discussed.  In  particular,  a  comprehensive  analysis  of  the  compu¬ 
tational  complexity  of  these  measures  is  presented,  including  some 
new  results  on  tlie  worst  case  testing  complexity  of  Boolean  functions. 
The  activity  of  a  variable,  a  new  concept  that  measures  the  contribution 
of  a  variable  to  the  complexity  of  a  function,  is  defined  and  its 
relation  to  decision  trees  is  described.  These  results  are  subsequently 
generalized  to  relations  and  recursive  functions.  Based  upon  these 
developments,  a  complexity  measure  for  functions  of  discrete  variables 
is  proposed  and  its  use  in  testing  large  digital  systems  is  examined. 

The  exposition  concludes  with  an  assessment  of  the  work  done  and 
recommendations  for  future  research. 


CHAPTER  2 


PRELIMINARIES 


2.1.  Introduction 

As  mentioned  in  the  previous  chapter,  decision  trees  have  been  used 
in  a  number  of  areas,  including  computer  science,  biology,  engineering, 
and  management,  with  diverse  terminology  and  degree  of  generality.  The 
purpose  of  this  chapter  is  to  provide  some  basic  definitions  and  results 
and  to  establish  a  unified  terminology  in  which  to  express  the  general 
problem  as  well  as  the  various  special  cases  encountered  in  the 
literature. 

Some  elementary  concepts  from  Boolean  algebra,  graph  theory,  and 
concrete  complexity  theory  are  first  reviewed,  as  they  will  be  used 
throughout  the  following  chapters.  Decision  trees  and  diagrams  are  then 
defined,  starting  with  the  simplest  and  most  widely  encountered  family 
of  functions — the  Boolean  functions — and  generalizing  to  (partial) 
functions  of  discrete  variables.  The  special  cases  of  decision  tables 
and  identification  (taxonomy),  which  have  attracted  more  attention  from 
researchers  than  any  other  aspect  of  the  overall  problem,  are  then 
examined  within  the  established  framework. 

2.2.  Basic  Concepts 

2.2.1.  Boolean  functions 

The  following  definitions  and  results  can  be  found  in  any  textbook 
on  Boolean  algebra  or  switching  theory  (Harrison  65,  Friedman  75). 
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A  (complecelv  specified)  Boolean  function  of  n  variables, 

f(Xj^ . x^) ,  is  a  mapping  from  {0,  If”  to  {0,  l},  where  {O, 

denotes  the  n-fold  cartesian  product  of  {0,  l},  that  is,  the  set  of 
binary  n-tuples.  The  set  of  all  n-tuples  mapped  by  the  function  to  the 

value  1,  {  (x^,  ....  x^)  [  f(Xj^ . x^)  »  l},  is  called  the  set  of 

min terms  of  f.  A  Boolean  function  can  be  specified  by  describing  the 
mapping  (giving  its  "truth  table")  or  by  listing  its  minterms;  it  can 
also  be  represented  by  a  Boolean  formula,  usually  in  terms  of  the  three 
operations  of  disjunction  (+) ,  conjunction  (•),  and  complementation  (  ). 
An  important  canonical  representation  as  a  formula  is  the  disjunctive 
normal  form  (DNF) ,  formed  by  a  disjunction  of  conjunctions,  where  each 
conjunction  includes  all  variables  and  represents  a  minterm;  this  form 
can  often  be  simplified  by  combining  conjunctions  to  obtain  a 
sum-of-products  (SOP) ,  which  can  be  minimized  (with  respect  to  the 
number  of  conjunctions)  by  the  well-known  Quine-McCluskey  algorithm. 

A  function  of  n  variables  can  be  expressed  in  terms  of  two  functions 
of  n  -  1  variables  by  means  of  Shannon's  expansion  theorem: 

^^*1 . V  '  ’‘i  '  ^^*1’  Vl'  ^i+V  ^n^ 

+  •  f(Xj,,  ....  x^_^,  1,  x.^^ . x^)  , 

for  each  choice  of  x^,  1  £  1  £  n;  this  will  be  written 

. V  “  ^  •  ^X^-O)  \  '  ^x^-l)  • 
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A  function,  f(x^,  x^) ,  is  fictitiously  dependent  upon 

variable  x^,  1  £  i  £  n,  (or  x^  is  a  redundant  variable  for  f)  exactly 

when  f,  =  f,  In  particular,  a  function  is  fictitiously 

(x.=0)  (x.=l) 

1  1 

dependent  on  each  of  its  variables  exactly  when  it  is  a  constant;  a 

function  with  no  redundant  variables  is  called  intrinsic.  Since  the 

domain  of  a  Boolean  function  of  n  variables  has  cardinality  2*^  and  its 

2" 

range  has  cardinality  2,  there  exist  2  distinct  Boolean  functions  of 
n  variables;  it  is  easily  shown  that  almost  all  (in  the  sense  of 

4- 

asymptotics')  Boolean  functions  are  intrinsic. 


Example  2.1.  Let  the  Boolean  function  of  three  variables. 


f(Xj^,  X2,  x^) ,  be  given  by  the  mapping: 


(0,  0,  0)  ^ 


(1,  0,  0) 


(0,  0,  1) 


(1.  0,  1) 


(0,  1,  0) 


(1,  1,  0) 


(0,  1,  1) 


(1,  1,  1) 


This  function  can  also  be  represented  by  the  list  of  its  minterms: 


{(0,  1,  0),  (0,  1,  1),  (1,  1,  1)}  , 


or  by  its  disjunctive  normal  form; 


f(Xj^,  X2,  x^)  -  ’‘1*2*3  *1*2*3  *1*2*3  ’ 


That  is,  the  ratio  of  the  number  of  items  of  Interest  to  the  total 
number  of  items  tends  to  1  as  the  total  number  of  items  grows. 
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which  can  be  simplified  to  yield  the  minimum  sum-of-products  form: 


f(x^,  X2,  x^)  “  x^x^  +  X2X2  . 


It  is  easily  verified  that  this  function  is  intrins 


ic.  O 


2.2.2.  Trees 

The  following  definitions  and  results  are  standard  topics  in  graph 
theory  (Harary  69)  and  computer  science  (Knuth  73) . 

A  (finite)  graph ,  G  =  (V,  E) ,  consists  of  a  (finite)  set  of  vertices 
(or  nodes) .  V,  together  with  a  set,  E,  of  unordered  pairs  of  distinct 
vertices  from  V,  called  edges ;  if  E  is  a  multiset  (i.e. ,  elements  may  be 
repeated) ,  then  G  is  called  a  hypergraph.  A  graph  is  called  directed 
(is  a  digraph)  if  each  edge  is  an  ordered  pair;  E  is  then  considered  as 
an  irreflexive  relation  on  V  x  v.  Let  S  be  a  set  of  symbols;  a  graph  is 
vertex  labelled  if  there  is  a  function,  g:  V  *  S;  it  is  edge  labelled  if 
there  is  a  function,  h:  E  S. 

Let  e  ■  V2)  be  an  edge;  then  e  and  v^^  (and  e  and  v^)  are  said 

to  be  incident ;  v^^  and  V2  are  called  adjacent  vertices.  The  degree  of 
a  vertex  in  a  graph  is  the  number  of  edges  incident  with  it;  in  a 
digraph,  the  degree  of  a  vertex  is  the  sum  of  the  in-degree  (the  number 
of  edges  directed  towards  the  vertex)  and  the  out-degree  (the  number  of 
edges  directed  away  from  the  vertex).  A  cycle  is  an  alternating  sequence 

of  three  or  more  vertices  and  edges,  Vq,  e^^,  v^^,  e2 . ^n-1’  ®n’ 

beginning  and  ending  at  the  same  vertex,  in  which  each  edge  is  incident 
with  Che  two  vertices  immediately  preceding  or  following  it,  and  such 
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that  no  two  vertices  are  identical.  A  graph  without  cycles  is  called 
acyclic . 

A  tree  is  a  connected  acyclic  graph;  it  is  easily  shown  that  a  tree 
with  n  vertices  must  have  exactly  n  -  1  edges.  A  rooted  tree  is  a  tree 
with  a  distinguished  vertex  called  the  root.  It  is  often  convenient  to 
have  a  definition  of  a  tree  that  introduces  more  structure  and  lends 
Itself  to  inductive  proofs;  a  tree  is  therefore  defined  recursively  as 
a  finite  nonempty  set  of  vertices  such  that  there  is  a  distinguished 
vertex  called  the  root  and  the  remaining  vertices  are  partitioned  into 
zero  or  more  disjoint  sets,  each  in  turn  a  tree  (called  a  subtree) .  Such 
a  tree  can  be  thought  of  as  a  connected  directed  acyclic  graph  where  all 
edges  are  directed  away  from  the  root. 

Thus  the  in-degree  of  the  root  is  zero  and  that  of  every  other 
vertex  is  one.  Vertices  with  nonzero  out-degree  are  called  internal; 
those  with  zero  out-degree  are  called  external  vertices  or  leaves .  A 
full  k-ary  tree  is  one  where  every  internal  vertex  has  an  out-degree  of 
k.  [This  terminology  differs  somewhat  from  that  of  (Knuth  73)].  Vertices 
adjacent  to  the  root  are  called  its  children  and  the  root  is  their  parent . 
Since  a  single  path  exists  from  the  root  to  any  vertex  in  the  tree,  the 
depth  of  a  vertex  is  defined  as  the  number  of  edges  traversed  on  the 
path;  the  height  of  a  tree  is  defined  as  the  maximal  depth  of  any  vertex 
in  the  tree;  finally,  the  path  length  of  a  tree  is  the  sum  of  the  depths 
of  its  leaves. 

When  the  order  of  the  children  of  each  vertex  is  of  Importance,  the 
tree  will  be  called  ordered  or  planar  (since  the  manner  of  imbedding  the 
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tree  in  a  plane  is  then  relevant) .  Following  the  convention  in  use  in 
computer  science,  trees  will  be  drawn  with  the  root  at  the  top;  in 
ordered  trees,  subtrees  will  be  drawn  left  to  right. 

Example  2.2.  The  tree  of  Figure  1.1  (page  3)  is  an  ordered,  vertex 
and  edge  labelled,  full  binary  tree.  The  root  is  labelled  (a:b)  and  is 
at  depth  0;  the  leaves  are  labelled  by  the  permutations  of  (abc)  and  are 
at  depths  2  and  3;  the  height  of  the  tree  is  3.  □ 

2.2.3.  Concrete  complexity 

This  section  is  based  on  (Garey  79)  and  uses  the  same  terminology. 
Concrete  complexity  theory  is  concerned  with  measuring  the  computational 
complexity  of  algorithms,  usually  in  terms  of  time  and  space. 

An  algorithm  is  a  precise,  step-by-step  procedure  (e.g.,  a  computer 
program)  for  solving  a  problem.  A  problem  is  composed  of  parameters  of 
unspecified  value  and  a  question  to  be  answered,  and  is  specified  by 
describing  the  nature  of  its  parameters  and  the  properties  that  its 
solution  must  possess.  An  instance  of  a  problem  is  obtained  by  providing 
specific  values  for  the  problem  parameters.  An  algorithm  solves  a  problem 
if  it  is  guaranteed  to  provide  a  solution  when  applied  to  any  instance  of 
the  problem. 

Example  2.3.  The  following  problem  is  well  known  as  the  minimum 
cover  problem.  The  problem  parameters  are  a  set,  S,  and  a  collection, 

S 

C,  of  subsets  of  S  (i.e.,  C  S  2  );  the  problem  question  asks  to  find  the 
smallest  subset,  C' ,  of  C  such  that  C'  is  a  cover  for  S,  that  is,  each 
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element  of  S  belongs  to  at  least  one  element  of  C',  One  instance  of  the 
problem  is  given  by  S  =  la,  b,  c,  d},  C  «•  {{a},  {a,  c},  {a,  b,  d} ,  {b}, 
{d}},  for  which  the  solution  is  C'  -  {{a,  c},  {a,  b,  d}}.  An  algorithm 
to  solve  this  problem  must,  for  each  problem  Instance,  find  the  minimum 
cover,  C',  or  lepoiC  that  no  such  cover  exists  (which  happens  whenever 
C  itself  is  not  a  cover) .  CD 

Concrete  complexity  theory  concentrates  on  measuring  the  time 
requirements  of  an  algorithm  on  some  reasonable  model  of  computation, 
such  as  a  Turing  machine,  a  register  machine,  or  a  general  purpose 
coiiq>uter.  These  requirements  are  expressed  as  a  function  of  the  size  of 
the  problem  Instance,  that  is,  in  a  sense,  as  a  function  of  the  size  of 
the  input  to  the  algorithm.  The  size  of  a  problem  instance  is  measured 
by  encoding  the  instance  in  a  reasonable  manner  (that  is,  in  a  manner 
that  is  not  artificially  wasteful  of  space)  and  measuring  the  length  of 
the  code. 

Example  2.4.  An  instance  of  the  minimum  cover  problem  can  be 
encoded  In  binary  by  first  giving  the  size  of  S,  which  takes  about 
log2  |s|  sjrmbols,  then  coding  each  element  of  C  by  |s|  digits,  where  the 
i-th  digit  is  1  if  the  i-th  element  of  S  belongs  to  that  element  of  C 
and  is  0  otherwise.  This  in  turn  takes  |c|'|s|  symbols,  so  that  the 
input  has  size  log2  ls|  +  lc|*ls|,  which  is  of  the  order  of  [cl'lsj  for 
large  sets.  [“"1 

The  time  complexity  function  expresses  the  time  requirements  of  an 
algorithm  by  giving,  for  the  size  of  each  instance,  the  maximum  amount 
of  time  spent  by  the  algorithm  to  solve  a  problem  instance  of  that  size. 
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Since  different  encodings  will  result  in  somewhat  different  size 
measures,  the  function  is  usually  expressed  as  the  order  of  the  rate  of 
growth  of  the  time  requirements.  Specifically,  a  function,  f(n),  is 
said  to  be  n))  whenever  there  is  a  constant,  c,  such  that  |f(n)| 

£  c*|g(n)|  for  all  values  of  n.  Thus,  for  instance,  3n  +  5  is  Gin), 

4n^  +  n  is  (?(n^) ,  and  Ae*^  +  n^^  is  <?(e^). 

A  polynomial  time  algorithm  has  time  complexity  no  larger  than 
<^p(n)),  where  n  is  the  input  size  and  p  is  some  polynomial  function; 
when  the  time  complexity  of  an  algorithm  cannot  be  so  bounded,  the 
algorithm  is  said  to  require  exponential  time.  Polynomial  time  is 
associated  with  efficient,  and  exponential  time  with  inefficient, 
algorithms,  as  is  illustrated  in  Table  2.1,  where  running  times  are 
tabulated  for  several  time  complexity  functions  and  instance  sizes, 
assuming  each  step  to  take  one  microsecond  (1  ys)  on  present  day 
computers  (the  upper  numbers)  and  one  picosecond  (1  ps)  on  futuristic 
machines  (the  lower  numbers).  Not  only  are  exponential  time  algorithms 
incomparably  slower  than  polynomial  time  ones,  but  futuristic  machines 
bring  only  minor  relief,  whereas  they  considerably  speed  up  polynomial 
time  algorithms. 

It  Is  sometimes  possible  to  show  that  a  problem  cannot  be  solved  in 
less  than  (^(f(n))  time,  for  some  function  f.  A  well-known  example  is 
sorting,  which  is  known  to  require  at  least  (^n  *  log  n)  comparisons  for 
n  objects.  Often,  however,  such  results  cannot  be  attained.  In  particu¬ 
lar,  there  exist  numerous  problems  for  which  only  exponential  time 
algorithms  are  known,  but  which  are  not  known  to  require  this  time 
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complexity.  (Indeed,  very  few  problems  of  practical  importance  have 
been  shown  to  require  exponential  time,  unless  the  solution  itself  takes 
exponential  time  to  describe.) 

Of  particular  interest  among  the  latter  are  problems  belonging  to 
the  class  NP  (for  nondeterministic  polynomial) ;  those  are  all  the 
problems,  the  solutions  of  which  can  be  verified  in  polynomial  time. 

A  nondeterministic  machine  can  solve  an  NP  problem  by  "guessing"  a 
structure  and  verifying  that  it  is  a  solution  in  polynomial  time.  In 
particular,  of  course,  all  problems  solvable  in  polynomial  time  (the 
class  P)  are  in  NP.  One  of  the  most  important  open  questions  in  computer 
science  is  to  decide  whether  P  =  NP  or  not.  (The  available  evidence  is 
discouraging;  many  of  the  NP  problems  for  which  no  polynomial  time 
algorithm  is  known  are  of  great  practical  importance  and  have  received 
a  lot  of  attention  over  the  past  thirty  years,  but  to  no  avail.) 


Example  2.5.  The  so-called  decision  problem  for  the  minimum  cover 
has  the  same  parameters  as  the  minimum  cover  problem,  plus  a  constant, 
k  <  jc].  The  question  is:  does  there  exist  a  subset  C'  of  C  such  that 
C'  is  a  cover  for  S  and  [C'l  ^  k?  This  problem  is  in  NP  since  a  non¬ 
deterministic  machine  can  "guess"  a  subset  C"  and  verify  in  polynomial 
time  whether  |C'|  ^  k.  Clearly,  this  problem  is  a  special  case  of  the 
minimum  cover  problem,  since  the  minimum  cover  Itself  must  be  a  solution 


to  the  decision  problem,  if  any  solution  exists. 


□ 


As  part  of  the  effort  to  solve  the  question  of  whether  P  ••  NP, 
researchers  Identified  a  class  of  problems  that  are  complete  for  the 
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set  NP.  That  is,  if  any  of  these  problems  can  be  solved  in  polynomial 
time,  then  so  can  all  problems  in  NP;  thus,  in  a  sense,  these  problems, 
called  NP-complete,  are  the  hardest  problems  in  NP.  The  decision  problem 
for  the  minimum  cover  is  an  example  of  an  NP-complete  problem  (Karp  72) . 

Since  the  analysis  leading  to  the  definition  of  NP-complete 
problems  was  done  in  terras  of  language  membership,  all  NP-complete 
problems  are  decision  problems,  that  is,  they  ask  a  question  about  the 
existence  of  a  particular  structure.  The  concept,  however,  can  be 
enlarged  to  optimization  problems,  as  illustrated  below  for  the  minimum 
cover  problem. 

As  seen  above,  the  decision  problem  for  the  minimum  cover  is 
NP-complete  and  is  a  special  case  of  the  minimum  cover  problem;  thus, 
in  a  sense,  the  latter  is  at  least  as  hard  as  the  hardest  problems  in 
NP.  Such  a  problem  is  called  NP-hard .  However,  as  will  be  seen,  the 
minimum  cover  problem  is  no  harder  than  NP-complete  problems  in  the 
sense  that,  if  its  associated  decision  problem  has  a  polynomial  time 
solution  (i.e.,  if  P  •  NP) ,  then  that  solution  can  be  used  to  solve  the 
minimum  cover  problem  in  polynomial  time.  Such  problems  are  then  called 
NP-easv;  a  problem  that  is  both  NP-hard  and  NP-easy  is  termed 
NP-equivalent. 

That  the  minimum  cover  problem  is  NP-easy  can  be  seen  by  using  the 
standard  technique  of  an  intermediate  completion  problem.  The  completion 
problem  for  the  minimum  cover  has  the  same  parameters  as  the  decision 
problem,  plus  a  "partial  solution"  subset,  C"  S  C;  the  question  is: 
does  there  exist  a  subset,  C' ,  of  C  such  that  |C'|  £  k,  C"  is  a  cover 


15 


for  S,  and  C' 3  C"  (C'  "completes"  C")7  This  problem  is  clearly  in 
NP;  moreover,  the  decision  problem  is  a  special  case  of  it,  where  C''  is 
chosen  as  the  empty  set.  Hence  the  completion  problem  in  NP-complete. 
Suppose  now  that  P  =  NP,  so  that  both  the  decision  and  the  completion 
problems  have  polynomial  time  solutions.  Since  the  cardinality  of  the 
solution  set,  C' ,  is  an  integer  between  0  and  |cl,  the  decision  problem 
can  be  used  log2  |c|  times,  in  a  binary  search  over  the  interval,  to 
determine  if  a  minimal  solution  exists  and,  if  so,  its  cardinality, 

^min’  clearly  takes  polynomial  time.  The  completion  problem  can 

then  be  used,  with  k  set  to  k  .  ,  to  build  the  solution  set  element  by 
element  as  follows.  First,  let  C"  include  a  single  element  of  C;  for 
at  least  one  choice  of  element,  C"  can  be  completed;  keeping  that 
element,  let  now  C"  include  one  other  element  of  C;  the  process 
continues  until  all  elements  have  been  found,  using  the  completion 

problem  at  most 


|C|  +  (lc|-l)  +  ...  +  (|cl-k^^^+i) 


-  k 


min 


times,  again  a  clearly  polynomial  time  process.  Thus,  the  minimization 
problem  can  be  solved  in  polynomial  time  if  the  decision  problem  can; 
hence  it  is  NP-easy. 
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2.3.  Decision  Trees  and  Diagrams 

2.3.1.  The  case  of  completely  specified  Boolean  functions 

Definition  2.1.  Let  f(x^,  ...,  x^)  be  a  (completely  specified) 

Boolean  function.  If  f  is  a  constant,  then  the  decision  tree  for  f 

consists  of  a  single  vertex  labelled  by  that  constant.  Otherwise,  for 

each  x^,  1  ^  1  ^  "i  f  has  a  decision  tree  composed  of  a  root  labelled 

X.  and  two  decision  subtrees,  the  first  for  the  subfunction  f-  ,  the 

1  (x^=0) 

second  for  the  sub  function  f  (x^=l) *  ^ 

Thus,  decision  trees  for  Boolean  functions  are  explicit  illustrations 
of  Shannon's  expansion  theorem.  This  recursive  definition  closely 
parallels  that  given  for  trees  in  Section  2.2.2;  it  defines  decision 
trees  for  Boolean  functions  as  rooted,  ordered,  vertex-labelled,  full 
binary  trees.  (The  choice  of  ordering  rather  than  edge  labelling  to 
distinguish  subtrees  is  arbitrary  and  a  matter  of  convenience.)  To  an 
extent,  this  definition  prevents  redundant  testing  in  a  tree,  in  that  no 
more  testing  may  take  place  as  soon  as  the  function  has  been  reduced  to 
a  constant. 

The  evaluation  of  a  Boolean  function  represented  by  a  decision  tree 
starts  by  ascertaining  the  value  of  the  variable  associated  with  the  root 
of  the  tree;  it  Chen  proceeds  by  repeating  the  process,  on  Che  left 
subtree  if  the  variable  was  false  or  on  the  right  subtree  if  the  variable 
was  true,  until  a  leaf  is  reached;  the  label  of  the  leaf  gives  the  value 


of  the  function. 
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Example  2.6.  The  Boolean  function  of  Example  2.1  was  given  by 
the  formula 

f(x^,  X2,  x^)  =  +  x^x^  . 

Two  possible  decision  trees  for  that  function  are  shown  in  Figure  2.1. 
Since  decision  trees  are  ordered,  the  left  subtree  of  a  node  always 
corresponds  to  the  variable  associated  with  the  node  being  evaluated 
at  0,  the  right  subtree  to  the  variable  being  evaluated  at  1.  Thus, 
evaluation  on  the  tree  of  Figure  2.1(a)  for  the  triple  of  values 
(0,  1,  0)  would  first  examine  variable  Xj^;  on  finding  it  to  be  0,  it 
would  proceed  to  the  left  subtree,  there  to  examine  variable  x^ ;  since 
X2  =  1,  the  right  subtree  would  next  be  used,  thereby  encountering  a 
leaf  and  terminating  the  evaluation.  The  label  of  that  leaf,  1,  is  the 
value  of  the  function  for  the  given  triple  of  values,  obtained  by 
examining  only  two  of  the  three  variables;  the  same  evaluation  on  the 
other  tree  would  require  that  all  three  variables  be  examined.  The  first 
tree  represents  the  expansion: 

f(Xj^,  X2,  Xj)  -  x^’(x2’0  +  X2*l) 

+  Xj^‘(xJ*0  +  x^'CxJ’O  +  X2*l))  . 

Since  the  root  of  each  subtree  can  be  labelled  with  any  of  the 
untested  variables,  Che  number  of  possible  decision  trees  for  a  given 
function  is  in  general  very  large.  For  instance,  the  function  of 
Example  2.6  has  ten  distinct  decision  trees,  as  depicted  in  Figure  2.2. 


In  fact,  it  is  easily  seen  that  a  Boolean  function  of  n  variables  may 
have  up  to 


n-1  „i 

n  (n  -  i)^ 
i-0 


(2.3.1) 


distinct  decision  trees  (n  choices  are  possible  for  the  root,  followed 

2 

by  n  -  1  choices  on  each  of  the  two  subtrees,  or  (n  -  1)  choices;  in 

2^ 

general,  up  to  (n  -  k)  choices  are  possible  at  depth  k) .  This 
corresponds  to  the  recurrence  relation: 

Jl^(n)  =  n  •  -  1))^  ,  (2.3.2) 


which  shows  that  J(J(n)  grows  faster  than  2^  .  The  first  few  values  of 
J/^(n)  are  listed  in  Table  2.2. 


Table  2.2 

The  Number  of  Decision  Trees  for  Boolean  Functions 


n 

J^(n) 

n 

tl!^(n) 

1 

1 

6 

1.65 

10 

2 

2 

7 

1.91 

10 

3 

12 

8 

2.91 

10 

4 

576 

9 

7.64 

10 

5 

1,658,880 

10 

5.84 

10 

13 

27 

55 

111 

224 


Not  all  decision  tree  representatives  of  a  function  are  equally 
desirable.  Thus,  several  criteria  have  been  used  in  order  to  select  an 
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appropriate  representation;  such  criteria  attempt  to  measure  important 
properties  of  decision  trees,  in  particular  their  realization  and  usage 
costs. 

In  the  most  general  case,  each  variable  has  an  associated  testing 
cost  (corresponding  to  the  time  needed  for  its  evaluation,  or  the  actual 
expense  incurred  at  each  evaluation,  or  some  other  cost  related  to  the 
determination  of  the  variable's  value)  and  an  implementation  (or  storage) 
cos t  (corresponding  to  the  amount  of  hardware  or  memory  necessary  to 
choose  a  path  depending  on  the  variable’s  value  or  to  some  other  cost 
related  to  the  apparatus  needed  for  decision  making) .  If  such  costs 
are  unknown,  they  are  taken  to  be  unity.  (A  third  cost  may  arise  in 
practice,  corresponding  to  the  cost  of  the  hardware  equipment  needed  to 
obtain  a  value  for  a  variable,  the  sensor  cost.  This  measure  has  no 
direct  relation  to  the  tree;  it  is  a  onetime  only  cost,  incurred  as  soon 
as  a  variable  is  tested  somewhere  in  the  tree,  and  is  significant  only 
for  redundant  variables;  it  will  be  further  discussed  in  Chapters  3  and 
4.) 

Based  on  this  information,  several  measures  have  been  defined  on 
decision  trees. 

Definition  2.2. 

(a)  The  storage  cost,  a,  of  a  decision  tree  is  the  sura  of  the 
Implementation  costs  of  its  nodes. 

(b)  The  worst  case  testing  cost,  h,  of  a  decision  tree  is  the  maximum, 
taken  over  all  the  paths  from  the  root  to  the  leaves,  of  the  path 
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costs  (where  the  cost  of  a  path  is  the  sum  of  the  testing  costs  of 
the  variables  examined  on  that  path) . 

(c)  The  total  testing  cost,  n,  of  a  decision  tree  is  the  sum,  taken  over 
all  the  paths  from  the  root  to  the  leaves,  of  the  path  costs. 

(d)  The  normalized  testing  cost,  H,  of  a  decision  tree  is  the  total 

testing  cost  divided  by  the  number  of  leaves.  I  1 

When  costs  are  unity,  the  storage  cost  reduces  to  the  number  of  internal 
nodes  in  the  tree,  the  worst  case  testing  cost  reduces  to  the  height  of 
the  tree,  while  the  total  testing  cost  reduces  to  the  path  length  of  the 
tree  and  the  normalized  testing  cost  reduces  to  the  average  path  length 
of  the  tree.  The  path  length  of  the  tree  is  itself  a  special  case  of 
the  tree  path  entropy  defined  in  (Green  73)  and  the  average  path  length 
a  special  case  of  the  normalized  tree  path  entropy. 


Example  2.7.  Assume  the  following  costs  for  the  function  of 
Example  2.6. 


storage  costs 

Xl=  1 

2 

x^:  3 

testing  costs 

xr  5 

x^:  2 

X3:  6 

The  various  measures 

defined 

above  are  then 

computed  for  the  two  trees 

of  Figure  2.1  (page  18)  and  listed  below. 


measure  tree  (a) 


tree  (b) 


measure 


tree  (a)  tree  (b) 


ct 

h 

n 

H 


8 

13 

51 

10.2 


6  node  count  4 

13  height  3 

35  path  length  12 

8.75  av.  path  length  2.4 


3 

3 

9 

2.25 


□ 
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A  given  decision  tree  is  likely  to  have  common  subtrees;  those  can 
then  be  constructed  just  once  and  used  on  other  paths  of  the  tree, 
instead  of  being  duplicated  throughout.  The  structure  thus  created  is 
not  a  tree,  since  the  in-degree  of  some  nodes  may  be  greater  than  one; 
it  may  be  assimilated  to  a  vertex  and  edge  labelled,  directed,  acyclic 
hypergraph  where  all  the  nodes  have  out-degree  zero  or  two  and  where 
there  is  a  single  node  of  in-degree  zero  (the  root).  This  defines  a 
decision  diagram;  further  requiring  that  there  be  only  one  leaf  labelled 
1  (the  "finish"  node)  yields  a  free  Boolean  graph. 

By  definition,  then,  a  decision  diagram  is  associated  with  a 
decision  tree;  there  is  a  one-to-one  correspondence  between  the  paths 
in  the  tree  and  those  in  the  diagram.  Thus,  all  the  measures  of 
Definition  2.2  are  valid  on  decision  diagrams.  Caking  Che  same  values  as 
on  their  associated  trees.  However,  the  tree  storage  cost,  a,  is 
inadequate  since  it  does  not  describe  the  savings  resulting  from  the 
identification  of  common  subtrees;  it  is  replaced  by  the  diagram  storage 
cost ,  8,  which  is  defined  as  the  sum  of  the  implementation  costs  of  the 
diagram's  nodes.  When  costs  are  unity,  this  reduces  to  the  number  of 
internal  nodes  of  Che  diagram. 

Example  2.8.  Consider  again  the  function  of  Example  2.6;  two  free 
Boolean  graph  representations,  associated  with  the  corresponding  trees 
of  Figure  2.1  (page  18),  are  depicted  in  Figure  2.3.  Diagram  (a)  has 
8  «  6  and  so  does  diagram  (b) ;  both  diagrams  have  three  internal  nodes, 
in  contrast  to  the  corresponding  trees.  It  is  clear  from  Figure  2.3 
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Figure  2.3.  The  two  free  Boolean  graphs  of  Example  2.8. 


that  edge  labelling  is  preferable  co  (subdiagram)  ordering  as  a  means  to 
distinguish  edges,  since  diagrams  have  a  decidedly  more  complex  structure 

Chan  trees.  EZ] 

Further  information  is  often  available  about  a  function  in  the  form 
of  a  probability  distribution  on  the  variables'  values,  that  is,  a 
function  p:  {0,  l}*'  -*•  [o,  l];  when  the  distribution  is  not  specified,  it 
can  be  taken  to  be  uniform,  that  is,  each  n-tuple  of  values  is  equally 
likely.  This  distribution  allows  a  quantitative  measurement  of  the 
average  behavior  of  a  decision  tree  or  diagram. 

Specifically,  a  path  from  the  root  to  some  node  can  be  assigned  a 
probability,  which  is  simply  the  sum  of  the  probabilities  of  all 
combinations  of  values  that  can  lead  to  that  node;  thus,  each  path  has 
an  expected  testing  cost,  which  is  the  product  of  its  probability  times 
its  cost.  The  expected  testing  cost,  E,  of  a  decision  tree  or  diagram 
is  Chen  defined  as  the  sum,  taken  over  all  the  paths  from  the  root  to 
Che  leaves,  of  the  expected  path  testing  costs. 


25 


When  all  costs  are  unity  and  the  distribution  of  variables'  values 
is  uniform,  the  information  necessary  and  sufficient  to  compute  all  of 
the  various  tree  measures  defined  above  consists  of  the  number  of  leaves 
at  each  depth  [or,  equivalently,  the  number  of  Internal  nodes  at  each 
depth,  since  one  set  can  easily  be  computed  from  the  other  (Knuth  73)J. 
Thus  a  decision  tree  for  a  function  of  n  variables  can  be  entirely 
characterized  by  an  (n  +  1) -tuple,  (^q,  >  where  is  the 

number  of  leaves  at  depth  i;  this  notation  will  be  called  leaf  profile, 
by  analogy  with  a  similar  notation  introduced  in  (Miller  79) .  The  five 
tree  measures  defined  above  can  then  be  rewritten  as  simple  functions  of 
the  leaf  profile: 

n 

node  count,  o  ■  J  X  -  1  ; 

i-0  ^ 

height,  h  “  max  {i  |  X^  ^  0}  ; 

n 

path  length,  n  •  i*X  ; 

i-0  ^ 

n 

average  path  length,  H  -  n/  ^  X.  ; 

i=0  ^ 

n 

expected  number  of  tests,  E  -  [  i*2~  'X  . 

i-0  ^ 

The  leaf  profile  provides  more  than  a  convenient  shorthand  for  simple 
problems;  it  allows  a  lexicographic  ordering  of  decision  trees.  This, 
in  turn,  gives  rise  to  two  other  measures,  the  maximum  profile,  which 
ranks  as  best  that  tree  which  is  largest  in  lexicographic  order  (on  the 
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grounds  chat  leaves  should  be  eucounteced  as  soon  as  possible) ,  and  the 
minimum  reverse  profile,  which  ranks  as  best  that  tree  which  is  smallest 
in  reverse  lexicographic  order  (on  the  grounds  that  long  paths  should  be 
minimized) .  Bo’-h  measures  are  easily  generalized  to  nonuniform  proba¬ 
bility  distributions  (by  replacing  "number  of  leaves"  by  "probability 
of  leaves")  but  cannot  be  applied  when  nonunity  costs  are  present. 

Example  2.9.  Given  the  function  of  Example  2.6,  assume  the 
following  probability  distribution: 

p:  (0,  0.  0)  -*•  0.10  (1,  0,  0)  -  0.05 

(0,  0,  1)  0.15  (1,  0,  1)  -<■  0.05 

(0,  1,  0)  -  0.05  (1,  1,  0)  -  0.25 

(0,  1,  1)  -  0.20  (1,  1,  1)  -  0.15 

Figure  2,^*  shows  the  two  trees  of  Figure  2.1  (page  18)  with  their  node 
probabilities;  the  expected  testing  cost  of  tree  (a)  is 

E^^j  =  (0.25  +  0.25)  ♦  (5  +  2)  +  0.3  •  (5+6) 

+  (0.05  +  0.15)  •  (5  +  6  +  2)  »  9.4  , 

while  chat  of  tree  (b)  is 


E 


(b) 


0.35  •  2  +  0.25 


(2  +  5)  +  (0.25  +  0.15)  •  (2+5+6) 


7.65  . 
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0.05  0.15 


0.25  0.15 


(a) 


(b) 


Figure  2.4.  The  two  decision  trees  of  Figure  2.1  with  their  node 
probabilities . 


The  leaf  profile  of  the  first  tree  is  (0,  0,  3,  2)  and  that  of  the 
second  tree  is  (0,  1,  1,  2);  thus,  the  second  tree  has  both  a  larger 


leaf  profile  and  a  smaller  reverse  leaf  profile. 


2.3.2.  The  general  case 

Completely  specified  Boolean  functions  are  only  a  special  case  of 
the  system  description  functions  discussed  in  Chapter  1.  In  general. 


the  system  is  described  by  a  (partial)  function  of  discrete-valued 


variables,  f(Xj^,  ...,  ,  where  each  variable,  x^,  can  take  on  exactly 

values,  for  some  strictly  positive  integer  m^;  without  loss  of 
generality,  those  values  will  be  denoted  0,  1,  ...,  m^  -  1.  Those 
combinations,  if  any,  for  which  no  mapping  is  specified  may,  by  defi¬ 
nition,  be  assigned  any  value  in  the  range  of  the  function.  (In  the 
case  of  Boolean  functions,  such  combinations  are  often  termed  "don't  care 
entries"  and  assigned  the  symbolic  value  (J>,) 


In  Che  obvious  way,  f(x.=k)  denote  f(x^,  k, 

•  •  • »  x^) ,  ^  variable,  will  be  deemed  redundant  if 

^(x.=0)  “  ^(x.-l)  *  •••  "  ^(x.-m.=l)  ' 

where  f(x^,  •..!  x^)  =  f(Xj^  ,  ...,  x^  )  if  either  both  combinations  are 
mapped  to  equal  values  or  at  least  one  of  the  two  combinations  has  no 
specified  image.  A  function  will  be  called  constant  if  all  combinations 
of  values  for  which  a  mapping  is  specified  are  mapped  to  the  same  value; 
it  is  noted  that  a  function,  all  variables  of  which  are  redundant,  need 
not  be  constant. 

The  following  is  the  natural  extension  of  Definition  2.1.  To  the 
author’s  knowledge,  it  is  the  first  formal  definition  proposed  for 
decision  trees. 


Definition  2,3.  Let  f(x, ,  . . . ,  x  )  be  as  above.  If  f  is  a 
constant,  then  the  decision  tree  for  f  consists  of  a  single  vertex 
labelled  by  that  constant.  Otherwise,  for  each  x^,  1  ^  i  ^  n,  f  has 
a  decision  tree  composed  of  a  root  labelled  x^  and  m^  decision  subtrees, 
corresponding  to  the  subfunctions  .q)’  _ 


(Xi*mi-1) 


n 


order. 

The  definition  of  decision  diagrams  is  similarly  extended. 

As  before,  storage  and  testing  costs  may  be  specified  as  well  as  a 
probability  distribution  on  the  combinations  of  variables'  values.  Unless 
all  variables  are  m-valued  for  some  fixed  m  (in  which  case  the  decision 
tree  is  a  full  m-ary  tree),  they  will  likely  have  different  costs;  thus 
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the  concept  of  leaf  profile  is  less  useful  for  general  functions  than 
for  m-ary  (in  particular.  Boolean)  functions.  However,  all  of  the  other 
measures  defined  in  Section  2.3.1  are  directly  applicable  to  the  general 
case. 


Example  2.10.  Let  f  be  a  partial  function  of  three  variables, 

2 

f:  {0,  1}  X  {O,  1,  2}  -*■  {a,  b,  c},  given  by  the  following  mapping: 


f : 

(0, 

0,  0)  -  a 

(1,  0,  0)  --  a 

(2, 

0,  1) 

b 

(0, 

0,  1)  -  a 

(1,  0,  1)  -  b 

(2, 

1,  0) 

-►  a 

(0, 

1.  1)  ^  c 

(1,  1,  0)  ^  c 

(2, 

1.  1) 

b 

Let  the 

variables'  costs  be 

as  follows: 

storage 

costs  x^: 

1  X2 :  2 

X3: 

3 

testing 

costs  Xj^: 

5  x^:  3 

X3: 

2 

Finally, 

,  let 

the  probability  distribution. 

,  P,  be 

given 

by: 

p: 

(0, 

o 

o 

o 

t 

o 

o 

(1,  0,  0)  ^ 

0.10 

(2, 

0,  0) 

-  0.00 

(0. 

0,  1)  -*•  0.05 

(1,  0,  1) 

0.05 

(2, 

0,  1) 

0.20 

(0, 

1,  0)  0.00 

(1,  1,  0)  -*■ 

0.10 

(2, 

1,  0) 

->•0,10 

(0, 

1,  1)  c.io 

(1,  1,  1) 

0.15 

(2. 

1,  1) 

0.15 

Three  of  the  combinations  are  assigned  a  probability  of  zero;  this, 
however,  does  not  imply  that  those  events  are  impossible,  but  merely 
that  they  are  extremely  rare.  (This  makes  provision  for  the  fact  that 
probability  estimates  must  suffer  from  inaccuracies.)  An  impossible 
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event  (one  that  would  result  in  a  contradiction)  will  have  a  zero 
probability  and  no  specified  mapping;  for  this  function,  (0,  1,  0)  and 
(2,  0,  0)  can  be  considered  impossible  events,  while  (0,  0,  0)  is  merely 
rare.  Figure  2.5  shows  two  decision  trees  (with  leaf  probabilities) 
and  associated  decision  diagrams  for  the  function.  The  various  measures 
defined  in  Section  2.3.1  are  tabulated  below. 

measure  a  g  h  n  H  E 

tree  and  diagram  (a)  13  10  10  72  9  8.8 

tree  and  diagram  (b)  9  9  10  69  8.625  7.85  |  1 

It  is  noted  that  a  decision  tree  will,  in  general,  make  arbitrary 
assignments  of  values  to  some  of  the  combinations  for  which  no  mapping 
was  specified;  in  fact,  it  is  always  possible  to  find  a  decision  tree 
that  leaves  no  unspecified  entry  for  functions  of  binary  variables. 

Further  generalizations  to  relations,  recursive  relations,  and  tree 
hierarchies  will  be  considered  in  Chapter  3. 

2.3.3.  Decision  tables 

The  terminology  used  for  decision  tables  in  the  following  is  that 
of  (Metzner  77) . 

A  decision  table  is  an  organizational  or  programming  tool.  It  can 
be  viewed  as  a  matrix  where  the  upper  rows  specify  sets  of  conditions 
and  the  bottom  ones  sets  of  actions  to  be  taken  when  the  corresponding 
conditions  are  satisfied;  thus,  each  column,  called  a  rule .  describes  a 
procedure  of  the  type  "if  conditions,  then  actions  "  Usually,  each 
condition  and  action  is  used  as  a  label  on  the  appropriate  row  and  a  rule 


.10  /  \  \  0.20  0. 

□  □  □  0 

0.00  0.10  0.10  0.15  0.00 


Figure  2.5.  The  two  decision  trees  and  associated  decision 
diagrams  of  Example  2.10. 


is  spepified  by  entering  values  in  the  condition  rows  (or  blanks,  for 
irrelevant  conditions,  called  "don't  care")  and  check  marks  (meaning 


"perform")  in  the  action  rows. 

Example  2.11.  The  following  decision  table  describes  how  to  spend 
a  Saturday  afternoon  in  spring.  It  has  two  condition  rows,  three  action 
rows,  and  four  rules;  the  first  condition  is  a  binary  variable  (taking 
values  from  the  set  {yes,  no}),  while  the  second  is  a  ternary  variable 
(taking  values  from  the  set  (calm,  breezy,  windy}). 


Raining? 

yes 

no 

no 

Wind  condition 

breezy 

calm 

windy 

Clean  basement 

/ 

/ 

Spade  garden 

/ 

Fly  kites 
with  children 

/ 

The 


four  rules  can  be  read  as : 

"if  it  is  raining,  then  clean  the  basement"; 

"if  it  is  breezy  and  not  raining,  then  fly  kites  with  the  children" 
"if  it  is  calm  and  not  raining,  then  spade  the  garden"; 

"if  it  is  windy,  then  clean  the  basement." 

A  pair  of  rules  overlaps  if  a  combination  of  condition  values  can 


be  found  that  satisfies  the  condition  sets  of  both  rules.  If  two  over¬ 
lapping  rules  specify  different  actions,  they  are  called  inconsistent 
and  their  table  is  said  to  be  ambiguous;  if  they  specify  identical 
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actions,  they  are  termed  redundant .  The  order  in  which  rules  appear  in 
the  table  is  normally  irrelevant.  An  exception  to  that  case  is  the 
so-called  else  rule,  which  always  appears  in  the  last  column  of  the 
table;  such  a  rule  has  no  condition  entries  and  is  to  be  used  when  no 
other  rule  in  the  table  can  be  applied. 

Example  2.12.  In  the  decision  table  of  Example  2.11,  rules  1  and  4 
overlap  because  they  are  both  applicable  when  it  is  raining  and  windy. 
Since  they  specify  the  same  actiu:'  set,  they  are  redundant,  and  since 
no  other  rules  overlap,  the  table  is  unambiguous.  That  same  table  can 
be  rewritten  using  an  else  rule,  thereby  considerably  simplifying  it,  as 
shown  below. 


Raining? 

no 

no 

Wind  condition 

breezy 

calm 

Clean  basement 

/ 

Spade  garden 

/ 

1 

Fly  kites 
with  children 

/ 

Tables  with  an  else  rule  are  examples  of  complete  tables,  which  have 
an  applicable  rule  for  every  combination  of  conditions. 

Decision  tables  described  so  far  are  in  so-called  extended-entry 
form.  Sometimes,  however,  it  is  required  that  all  conditions  be  Boolean 
variables;  this  gives  rise  to  limited-entry  decision  tables.  Although 
most  such  tables  are  set  up  in  limited  format  from  their  conception,  it 
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may  be  necessary  to  convert  extended-entry  tables  to  limited-entry 
format;  this  is  often  done  by  using  one  Boolean  variable  for  each  value 
(but  one)  of  the  multivalued  variable  to  be  replaced  (Press  65) .  This 
process  results  in  tables  where  entries  in  one  condition  row  often  imply 
(absent)  entries  in  others;  such  implied  entries  can  also  be  present  in 
any  decision  table  and  give  rise  to  apparent  (but  inexistent)  ambiguity. 
Since  the  implications  result  from  purely  semantic  considerations,  they 
cannot  be  detected  by  an  automatic  processor;  thus,  it  is  imperative  that 
they  be  specified  whenever  the  table  must  be  logically  checked  or 
translated.  The  impossible  combinations  of  conditions  will  then  be 
treated  (even  In  tables  with  an  else  rule)  as  inputs  with  unspecified 
mapping. 

E’---.^le  2.13.  The  decision  table  of  Example  2.12,  converted  to 
limited  entry,  is  shown  below  (with  an  else  rule);  it  still  has  three 
action  rows  and  three  rules,  but  now  has  three  condition  rows. 

Implied  entries  are  shown  in  parentheses;  their  absense,  while  not 
confusing  to  a  human,  would  induce  an  automatic  processor  to  decide  that 
the  first  two  rules  are  inconsistent,  since  both  could  apparently  apply 
when  it  is  not  raining  and  is  calm  and  breezy.  The  contradiction 
inherent  in  the  last  two  conditions  is  of  semantic  origin  and  thus 
undetectable  by  a  machine.  It  is  noted,  however,  that  the  specification 
of  Implied  entries  in  the  above  table  is  insufficient:  while  it 
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identifies  the  impossible  combination  (no,  yes,  yes),  it  fails  to 
identify  the  equally  impossible  combination  (yes,  yes,  yes),  which  will 
be  erroneously  included  in  the  else  rule.  This  suggests  that  logical 
inconsistencies  be  separately  listed;  for  instance,  the  above  table  would 
be  supplemented  by  the  logical  expression  NOT  (breezy  =  yes  AND  calm 
*=  yes) .  EIJ 

Further  information  about  the  table  is  often  provided  in  the  form 
of  implementation  and  testing  costs  for  the  conditions  and  a  probability 
distribution  on  the  rules;  when  all  rules  are  simple  (that  is,  each 
applies  to  a  single  combination  of  conditions)  and  the  table  is  complete, 
this  distribution  is  equivalent  to  one  specified  on  the  combinations  of 
conditions . 

It  should  now  be  clear  that  an  unambiguous  extended-entry  decision 
table  is  a  special  case  of  a  partial  function  of  multivalued  variables, 
where  the  conditions  correspond  to  the  variables  and  the  sets  of  actions 
to  the  function  values.  In  particular,  a  complete  decision  table 
corresponds  to  a  completely  specified  function,  and  a  limited-entry 
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decision  table  corresponds  to  a  function  of  binary  variables.  An 
ambiguous  decision  table  can  be  assimilated  to  a  relation,  a  case 
discussed  in  Chapter  5.  A  decision  tree  representation  of  a  decision 
table,  usually  called  a  sequential  testing  procedure,  is  then  of 
particular  importance,  as  it  corresponds  to  an  implementation,  usually 
in  software,  of  the  decision  table.  Indeed,  the  importance  of  the 
limited-entry  format  is  in  good  part  due  to  the  ease  of  programming 
binary  decisions  (by  if-then-else  constructs) . 

2.3.4.  Binary  identification  problems 

Identification  is,  of  course,  a  fundamental  problem  in  many  human 
endeavors.  Of  particular  interest  is  the  situation  where  an  unknown 
event  or  specimen  is  to  be  classified  into  one  of  a  finite  number  of 
categories,  based  upon  the  outcome  of  a  number  of  tests.  [This  is  a 
special  case  of  the  concept  of  questionnaire  (Picard  72) .]  Such  problems 
arise  in  biology,  medical  diagnosis,  machine  troubleshooting,  and 
numerous  pattern  recognition  applications.  A  binary  identification 
problem  includes  only  binary  tests. 

Formally,  a  binary  identification  problem  [as  defined  in  (Garey  72)] 
consists  of: 

-  a  finite  set  of  objects  (or  categories),  (O^,  ...,  0^},  which 
represents  the  universe  of  possible  Identifications; 

-  an  optional  probability  distribution  function  on  that  set  of 
objects  (if  absent,  the  distribution  is  taken  to  be  uniform); 


-  a  finite  set  of  tests  (or  questions) .  {Q^^,  Q^},  each  of 

which  is  a  subset  of  the  set  of  objects  (thereby  listing  the 
possible  identifications  for  the  unknown  object  as  determined  by 
a  positive  answer  to  that  particular  test); 

-  optional  sets  of  storage  and  testing  costs  associated  with  the 
set  of  questions  (when  unspecified,  costs  are  taken  to  be  unity). 

In  most  cases,  the  size  of  the  set  of  objects,  n,  is  larger  than  the 
size  of  the  set  of  questions,  m.  A  solution  to  such  a  problem  is  an 
identification  procedure,  that  is,  a  decision  tree  where  internal  nodes 
are  associated  with  questions  and  leaves  with  objects. 


Example  2.14.  Let  a  binary  identification  problem  be  given  by  a 
set  of  four  objects,  ^2*  ®3*  respective  probabilities, 

(0.1,  0.2,  0.3,  0.4},  and  a  set  of  three  tests,  {Q^^,  Q^,  Qj),  with 
■  {Oj^,  O2},  02  *  °4^’  ^3  "  unity  costs.  Two 

identification  procedures  for  this  problem  are  shown  in  Figure  2.6. 


Figure  2.6.  The  two  identification  procedures  of  Example  2.14. 
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An  identification  problem  is  clearly  a  special  case  of  a  partial 
function  of  binary  variables,  where  the  questions  correspond  to  the 
variables  and  the  objects  to  the  function  values.  In  fact,  a  binary 
identification  problem  corresponds  to  an  injective  and  surjective 
partial  function,  since  exactly  one  combination  of  variables'  values 
is  mapped  to  each  object. 

As  a  result,  decision  trees  for  binary  identification  problems 
have  a  fixed  number  of  leaves  (one  per  object)  and  thus  of  internal 
nodes  (since  the  number  of  internal  nodes  of  a  full  binary  tree  is  one 
less  than  the  number  of  its  leaves).  Since  no  two  leaves  are  identical, 
there  can  be  no  common  subtrees,  so  that  decision  diagrams  for  identi¬ 
fication  problems  are  decision  trees.  Another  consequence  is  that, 
when  storage  costs  are  unity,  all  decision  trees  for  the  problem  have 
the  same  tree  (and,  of  course,  diagram)  storage  cost  (the  number  of 
objects  minus  one) . 

Example  2.15.  The  various  tree  and  diagram  measures  defined  in 
Section  2.3.1  are  tabulated  below  for  the  two  trees  of  Figure  2.6. 

measure  a  8  h  n  H  E 

tree  (a)  3  3  2  8  2  2 

tree  (b)  3  3  3  9  2.25  1.9 

Thus,  the  second  tree  has  a  lesser  expected  testing  cost  than  the  first, 
even  chough  its  height  is  greater  and  its  root  is  associated  with  the 
redundant  variable,  x^  (which  the  first  tree  does  not  test  at  all).  It 
is  noted  Chat  H  is  in  constant  ratio  to  0,  since  the  number  of  leaves 
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is  constant.  If  It  is  now  assumed  that  the  probability  distribution  is 
uniform,  then  E  becomes  equal  to  H,  so  that  only  three  essentially 
different  measures  are  left  (not  counting  the  profiles):  a,  h,  and  E. 


CHAPTER  3 


SURVEY  OF  PREVIOUS  WORK 


3.1.  Introduction 

The  choice  of  decision  trees  as  models  of  functions  of  discrete 
variables  raises  several  questions.  First,  does  this  model  capture 
important  aspects  of  functions  that  are  not  reflected  in  other  models 
(such  as  Boolean  calculus  for  logic  functions)?  Next,  since  several 
measures  have  been  used  on  decision  trees,  what  can  be  said  of  those 
measures  when  applied  to  functions,  and  how  can  optimal  tree  represen¬ 
tations  be  developed?  Finally,  is  it  possible  to  develop  from  the 
model  a  measure  of  complexity  (as  independent  of  the  model  as  possible) 
and  to  apply  it  to  practical  problems? 

Since  decision  trees  can  be  regarded  as  sequential  evaluation 
procedures,  most  of  the  published  results  about  them  concern  the 
conversion  of  parallel  data  to  optimal  and  suboptimal  decision  trees. 
Three  principal  lines  of  investigation  have  progressed  independently  so 
that  efforts  have  often  been  duplicated.  In  the  following  sections,  each 
area  will  be  reviewed  separately;  however,  connections  will  be  explicitly 
mentioned  and  all  cases  will  be  expressed  in  the  general  framework 
introduced  in  Section  2.3. 

The  problem  of  converting  a  discrete  function  to  an  optimal  decision 
tree  is  a  difficult  one  and  its  exact  complexity  remained  unknown  for 
years;  therefore,  both  optimal  and  heuristic,  suboptimal  algorithms 
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abound.  Only  the  most  significant  will  be  reviewed  here;  as  will  be 
seen,  they  are  representative  of  a  very  few  basic  techniques  which,  with 
slight  variations,  comprise  almost  all  of  the  proposed  algorithms.  The 
case  of  binary  identification,  as  found  in  taxonomy  and  machine  diagnosis, 
is  first  reviewed,  since  it  constitutes  a  rather  restricted  subproblem 
and  also  because  it  was  the  first  to  be  investigated,  [indeed,  the  use 
of  decision  trees  in  biology — where  they  are  called  diagnostic  keys — is 
thought  to  go  back  to  Aristotle  and  Theophrastus  (Morse  71)-.]  The 
conversion  of  decision  tables  to  computer  programs,  which  has  engendered 
a  wealth  of  articles,  is  surveyed  next.  The  more  general  problem  of 
representing  discrete  functions  by  decision  trees  and  diagrams,  which  has 
been  studied  in  the  context  of  switching  theory,  pattern  recognition,  and 
concrete  complexity  theory,  is  then  reviewed.  The  various  findings  are 
regrouped  in  a  short  summary  of  the  "state-of-the-art"  knowledge  about 
decision  trees. 

3.2.  Diagnosis  and  Identification 

The  simplest  form  of  sequential  evaluation  is  a  linear  sequence 
where,  at  each  step,  only  one  path  leads  to  another  test.  This 
corresponds  to  a  degenerate  tree  with  a  number  of  internal  nodes  equal 
to  its  height.  Finding  an  optimal  tree  then  reduces  to  choosing  one  of 
the  test  sequence  permutations;  moreover,  tne  only  applicable  criterion 
is  the  expected  testing  cost.  Variants  of  this  problem  were  studied  by 
(Johnson  56,  Hoehn  58,  Rlesel  63,  Slagle  64,  Hanani  77),  who  gave  simple 
necessary  and  sufficient  conditions  for  an  ordering  (in  terms  of  the 
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ratio  of  cost  to  probability)  that  minimizes  the  expected  testing  cost 
(or  time) . 

In  a  typical  application  to  machine  diagnosis  or  specimen 
identification,  many  more  variables  are  considered  than  are  needed  to 
distinguish  all  values.  Thus,  it  makes  sense  to  attempt  to  minimize 
the  size  of  the  set  of  variables  (or  its  cost  if  sensor  costs — as 
defined  in  Section  2.3.1 — have  been  specified).  This  corresponds  to 
finding  a  tree  of  minimum  height  when  the  order  of  testing  is  constrained 
to  be  the  same  on  all  branches  of  the  tree;  the  corresponding  decision 
problem  was  recently  proved  to  be  NP-complete  (Garey  79,  p.  222).  [Thus, 
the  simple  exhaustive  search  method  proposed  in  (Willcox  72)  is  not  much 
worse  than  what  might  reasonably  be  expected.]  The  complexity  of  an 
optimal  algorithm  was  recognized  early  and  a  heuristic  selection 
criterion  developed  (Gyllenberg  63,  Chang  65,  Chang  70).  According  to 
this  criterion,  the  first  variable  selected  is  that  which  gives  rise  to 
the  largest  number  of  pairs  of  values;  since  this  happens  when  the  set 
of  values  is  divided  into  subsets  of  most  nearly  equal  sizes,  this  is  an 
example  of  a  splitting  algorithm.  Once  a  variable  is  selected,  the  same 
computations  are  carried  out  independently  on  the  resulting  subsets  and 
the  results  added  to  determine  the  next  variable  to  be  selected.  The 
criterion  was  enlarged  in  (Chang  65)  to  include  the  case  of  arbitrary 
partial  functions  of  binary  variables;  in  this  case,  only  the  pairs  of 
distinct  values  are  tallied.  No  results  were  published  on  the  performance 
of  this  selection  algorithm. 
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It  is  noted  that  this  criterion  is  easily  extended  to  take  into 
account  sensor  costs  by  using  the  ratio  of  the  number  of  pairs  of 
(distinct)  values  to  the  sensor  cost  as  the  critical  measure. 

The  problem  of  constructing  optimal  binary  identification  trees  was 
first  seriously  considered  in  (Brule  60,  Kletsky  60) .  In  the  first 
paper,  the  number  of  different  decision  trees  (called  sequential  test 
diagrams)  for  n  objects  with  all  2^-1  binary  tests  available  was 
estimated  to  be  of  the  order  of  1.78^  *  nl  and  that  of  distinct  leaf 
profiles  with  n  leaves  of  the  order  of  1.84*'  (although  the  concept  of 
leaf  profile  was  not  mentioned).  The  authors  also  defined  the  expected 
and  worst  case  testing  costs,  E  and  h;  in  particular,  they  showed  that 
when  all  2**  -  1  tests  are  available  and  costs  are  unity,  the  tree 
corresponding  to  the  Huffman  code  (Huffman  52)  minimizes  the  expected 
testing  cost  [a  result  previously  derived  in  (Zimmerman  59)]. 

The  analogy  with  coding  and  information  theory  was  further  pursued 
in  the  second  paper,  where  the  entropy  of  a  function  was  defined. 
Expressed  in  the  notation  developed  in  Section  2.3,  the  entropy  of  any 
partial  function  of  n  variables,  f(Xj^,  ...,  >  is  the  quantity: 

H(f)  -  -  p(f  -  v)  •  log2p(f  ■  v)  , 

V 

where  p(f  «  v)  is  the  probability  that  f  takes  the  value  v  (i.e.,  the 
sum  of  the  probabilities  of  all  n-tuples,  (Xj^,  ...,  x^) ,  that  are 
mapped  to  v)  and  the  sum  is  taken  over  all  values,  v,  in  the  range  of  f. 
The  entropy  of  a  function  can  be  considered  as  expressing  its  Initial 
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ambiguity  or  its  information  content  [as  in  (Rescigno  61),  where  it  is 
called  "repartment"].  The  authors  then  proposed  to  rank  the  variables 
of  a  function  by  the  ratio  of  the  ambiguity  they  remove  to  their  cost, 
where  the  ambiguity  removed  by  a  variable,  x^,  which  takes  on  m^  values, 
can  be  expressed  as: 

mi-1 

H(x  )  =  -  I  p(x  =  j)  •  log.pCx  =  j)  . 

^  j=0  ’• 

Thus,  the  ambiguity  removed  by  a  variable  also  has  the  form  of  an 
entropy;  Indeed,  it  was  independently  derived  as  an  entropy  function  in 
(Mandelbaum  64) . 

Using  the  removed  ambiguity  per  unit  cost  as  a  selection  criterion 
in  constructing  decision  trees  yields  the  information  algorithm,  of 
which  a  large  number  of  published  heuristics  is  a  special  case.  In 
particular,  when  costs  are  unity  and  the  probability  distribution  is 
uniform,  the  information  algorithm  chooses  that  variable  which  partitions 
the  image  of  the  function  into  the  most  nearly  equal  subsets;  this  is 
recognized  as  a  special  case  of  the  splitting  algorithm.  Numerous 
publications  make  use  of  one  or  the  other  of  these  algorithms,  with 
appropriate  modifications,  to  solve  problems  in  machine  diagnosis  and 
biological  classifications  (La  Macchla  62,  Winston  69,  Pankhurst  70, 

Morse  71,  Gower  72).  However,  no  analysis  of  the  criteria  was  published 
until  1974,  when  (Garey  74)  showed  that  the  ratio  of  the  expected  testing 
cost  of  a  tree  constructed  by  the  splitting  algorithm  to  that  of  an 
optimal  tree  could  be  arbitrarily  large,  even  in  the  case  of  uniformly 


distributed  objects.  ^For  n  objects  the  ratio  can  be  as  large  as 
log2n/log2log2n;  if  arbitrary  probability  distributions  are  allowed,  the 
ratio  can  be  at  least  n  (Garey  80)].  This  disproved  a  longstanding 
conviction  that  the  splitting  algorithm  was  optimal  for  uniform  distri¬ 
butions,  as  quoted  in  (Brule  60,  Kletsky  60,  Osborne  63).  However, 

(Hung  74)  showed  that  the  splitting  algorithm  is  asymptotically  optimal, 
in  the  sense  that,  as  the  number  of  variables  grows,  the  expected  value 
of  the  ratio  of  the  cost  of  the  trees  constructed  by  the  splitting 
algorithm  to  that  of  the  optimal  trees  converges  to  one. 

Algorithms  for  constructing  a  decision  tree  with  minimal  expected 
testing  cost  were  presented  in  (Garey  72,  Misra  72)  (the  first  for  binary 
tests,  the  second  for  multivalued  tests).  Both  algorithms  make  use  of 
dynamic  programming  and  may  require  time  exponential  in  the  size  of  the 
input  (in  contrast  to  the  quasi-linear  splitting  and  information 
algorithms).  There  is,  however,  little  likelihood  of  improvement  since 
the  decision  problem  associated  with  the  construction  of  optimal  decision 
trees  for  partial  bijective  functions  is  now  known  to  be  NP-complete 
(Hyafil  76,  Loveland  79). 

3.3.  Decision  Tables 

The  conversion  of  decision  tables  to  computer  programs  using  decision 
trees  has  been  the  subject  of  numerous  articles  over  the  past  fifteen 
years.  However,  a  majority  of  these  articles  repeat  the  results  presented 
in  others  or  contain  erroneous  statements  (often  masked  by  a  different — 
and  sometimes  cryptic — notation).  Consequently,  only  those  articles  of 


46 


actual  or  historical  importance  will  be  reviewed.  Further  references 
are  available  in  the  book  by  (Metzner  77) ,  the  survey  paper  of 
(Pooch  74),  or  the  special  issue  of  SIGPLAN  Notices  (September  71). 

The  subject  was  first  studied  in  (Montalbano  62).  Two  desirable 
objectives  are  mentioned,  namely  minimizing  storage  cost  and  expected 
testing  cost;  to  that  effect,  two  heuristic  selection  strategies  are 
presented.  The  first  strategy  selects  variables  which  tend  to  maximize 
the  profile  of  the  tree  by  reaching  leaves  as  soon  as  possible;  this  is 
claimed  (erroneously,  as  will  be  seen  in  Chapter  4)  to  minimize  storage 
cost.  The  second  rule  (called  "delayed  rule")  is  intended  to  minimize 
the  expected  testing  cost  by  selecting  at  each  step  that  variable  which 
divides  the  decision  table  into  most  nearly  equal  subtables;  hence,  it 
is  a  special  case  of  the  splitting  algorithm.  This  rule  was  refined  in 
(Pollack  65)  to  take  into  account  the  fact  that  several  tables  may 
represent  the  same  decision  process;  it  was  proposed  to  minimize  the 
number  of  rules  leading  to  the  same  set  of  actions  by  the  Quine-McCluskey 
algorithm,  thereby  yielding  a  minimal  equivalent  table  (which,  however, 
is  still  not  unique).  The  author  overlooked  the  fact  that  the 
Quine-McCluskey  algo  ithm  requires  e-q)onential  time  [in  fact,  the 
minimization  of  a  limited-entry  decision  table  or  of  a  Boolean  formula 
is  an  NP-hard  problem  (Masek  80)],  so  that  his  suboptimal  algorithm 
requires  exponential  time,  just  like  an  exhaustive  search  for  the  optimal 
tree.  The  same  oversight  can  be  found  in  several  otherwise  important 
articles  (Ganapathy  73,  Shwayder  75). 
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Pollack's  algorithm  for  minimizing  the  expected  testing  cost 
consists  of  several  rules  of  thumb,  which  are  roughly  equivalent  to  the 
information  algorithm.  This  was  recognized  by  (Shwayder  71),  who 
proposed  entropy  as  a  selection  criterion  for  limited-entry  decision 
tables  with  one  rule  per  action  set;  the  full  information  algorithm  was 
presented  in  (Ganapathy  73,  Shwayder  74).  Unfortunately,  both  authors 
use  the  decision  table  rather  than  its  underlying  function  as  a  basis 
for  the  algorithm;  in  consequence,  there  are  problems  of  nonuniqueness 
of  representation  and  of  table  minimization. 

Further  heuristic  selection  rules  for  minimizing  the  expected 
testing  cost  were  presented  by  (Verhelst  72,  Sethi  80).  The  first  author 
based  his  criterion  on  a  lower  bound  estimate,  later  shown  to  be  incorrect 
(King  74) ,  while  the  second  author  used  a  one-step  look  ahead  with  what 
amounts  to  an  information  algorithm. 

Montalbano's  "quick  rule"  for  minimizing  storage  cost  was  also 
successively  refined  by  several  researchers.  In  (Rabin  71),  it  is 
proposed  to  select  that  variable  which  results  in  a  minimum  number  of 
rules  being  split;  this  requires  that  a  minimal  disjoint  table  be  first 
obtained.  The  idea  is  extended  to  multivalued  conditions  in 
(Mlchalski  78),  and  refined  by  considering  the  minimum  number  of  disjoint 
rules  necessary  in  the  original  table  and  that  necessary  for  the  sub¬ 
tables  determined  by  the  variable  under  consideration.  Neither  author 
discussed  the  problem  of  generating  an  equivalent  table  with  a  minimum 
number  of  disjoint  rules;  this  is  again  a  fatal  flaw,  since  that  problem 
is  itself  NP-hard.  A  similar  analysis  was  performed  in  (Yasui  71, 
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Yasui  72)  for  limited-entry  tables;  it  was  showed  that,  when  applied  to 

limited-entry  tables  with  n  conditions  and  unity  storage  costs,  this 

selection  algorithm  (called  iterated  local  minimization)  may  construct 

11“  A 

a  tree  with  at  least  2  more  nodes  than  the  optimal  tree;  a  comparison 

with  Montalbano's  original  strategy  showed  that  each  algorithm  can 

11“  A 

construct  trees  with  2  more  nodes  than  those  constructed  by  the  other. 

The  first  thorough  analysis  of  the  conversion  of  limited-entry 
decision  tables  to  decision  trees  and  diagrams  was  a  two  part  article  by 
(Reinwald  66,  Reinwald  67),  whose  excellent  work  was  unfortunately 
illserved  by  an  exceedingly  complex  notation.  In  the  first  part,  the 
authors  derived  a  lower  bound  on  the  expected  testing  cost  of  partial 
decision  trees  as  a  function  of  the  variables  already  tested  (which  will 
be  further  examined  in  Chapter  5)  and  used  it  as  the  basis  for  a  fast 
suboptimal  strategy  (of  local  optimization)  and  for  an  optimal  seeking 
branch-and-bound  procedure  (which,  unfortunately,  may  require  exponential 
time).  In  the  second  part,  a  simple  lower  bound  on  the  storage  cost  of 
a  partial  decision  tree  was  derived  in  terms  of  the  irredundant  conditions 
and  used  again  for  a  fast  local  optimization  procedure  and  a  (sometimes 
exponential)  branch-and-bound  algorithm;  the  authors  further  showed  how 
to  modify  the  branch-and-bound  criterion  to  obtain  decision  diagrams 
with  minimum  storage  cost.  This  latter  procedure  remains  to  this  date 
the  only  one  applicable  to  decision  diagrams.  It  is  noted  that,  in 
both  papers,  the  authors  used  a  table  where  all  rules  are  simple,  which 
is  equivalent  to  giving  the  mapping  of  the  underlying  function.  This 
guarantees  that  the  table  has  a  unique  representation  and  avoids  the 
need  for  minimization. 
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The  problem  of  converting  limited-entry  decision  tables  to  optimal 
decision  trees  finally  received  an  efficient  solution  in  1973  with  the 
publication  of  a  dynamic  programming  algorithm  (Bayes  73) .  This 
solution  was  independently  rediscovered  by  (Schumacher  76)  who  generalized 
it  to  extended-entry  tables;  a  somewhat  faster  version  (due  to  the  use  of 
added  heuristics  inspired  from  game  trees)  was  recently  published 
(Martelli  78).  As  is  the  case  for  dynamic  programming  solutions,  the 
algorithm  builds  the  optimal  tree  from  the  leaves  up  by  identifying 
optimal  subtrees;  it  is  thus  applicable  to  any  criterion  of  optimality 
with  the  property  that  an  optimal  solution  must  have  optimal  subsolutions. 
In  particular,  it  can  be  used  to  find  the  optimal  trees  with  respect  to 
the  tree  storage  cost,  the  expected  and  worst  case  testing  costs,  and 
Che  minimum  reverse  and  maximum  profiles,  but  not  with  respect  Co  Che 
diagram  storage  cost  nor  with  the  total  and  normalized  testing  costs. 

The  running  time  of  the  algorithm  is  of  the  order  of  the  number  of 
possible  nodes  Chat  are  examined,  where  each  node  corresponds  to  a 
combination  of  tested  and  untested  variables.  For  a  function  of  n  k-ary 
variables,  there  are  (k  +  1)”  distinct  nodes  and  generating  them  from 
the  bottom  k*^  nodes  (all  variables  tested)  to  Che  Cop  node  (all  variables 
untested)  requires  a  number  of  steps  equal  to: 

I  !•(")•  k""^  -  n  •  (k  +  1)"~^  , 
i-1  ^ 

since  a  node  with  1  untested  variables  has  1  possible  parents  (each  with 
one  more  untested  variable)  and  can  be  chosen  in  (")  •  k*'  ^ 


ways.  Since 


50 


the  input  is  of  size  I  =  ,  the  dynamic  programming  algorithm 

log.  (k+1)  N 

requires  (?vl  ‘logl'  and  is  thus  very  efficient.  (For  binary 

1  585 

variables,  the  complexity  is  (?(I  ‘logl);  when  k  becomes  very  large, 

the  complexity  converges  to  (?(I*logI).) 

None  of  the  heuristic  or  optimal  algorithms  mentioned  above  is 
directly  applicable  to  ambiguous  decision  tables.  The  problems  posed  by 
these  tables  were  analyzed  in  (King  73),  who  concluded  that  a  pair  of 
inconsistent  rules  could  be  taken  to  mean  that  both  action  sets  should 
be  applied,  or  either  one,  in  an  arbitrary  manner.  As  will  be  seen  in 
Chapter  5,  this  behavior  can  be  characterized  by  using  relations  instead 
of  functions  in  the  description  of  decision  tables. 

3.4.  Representation  and  Evaluation  of  Functions 

Decision  trees  can  be  used  as  representations  of  functions  for 
purposes  of  software  or  hardware  Implementation,  or  as  an  analytical 
tool.  Some  of  those  applications  will  now  be  reviewv-d. 

Since  a  k-ary  decision  tree  is  the  exact  equivalent  of  a  hardware 
multiplexer  tree,  it  is  possible  to  synthesize  any  function  of  k-ary 
variables  with  only  one  type  of  element — a  k-to-one  multiplexer;  the 
resulting  circuit  will  have  few  Interconnections  and  will  lend  itself 
well  to  large  scale  integration.  Moreover,  some  multiplexers  may  feed 
several  others  (have  more  than  one  "parent")  to  form  a  multiplexer 
network,  the  equivalent  of  a  decision  diagram.  This  has  attracted 
researchers  in  switching  theory,  who  studied  the  problem  of  minimizing 
the  number  of  multiplexers  (that  is,  the  number  of  nodes  in  the  tree)  or 


the  worst  case  propagation  delay  (that  is,  the  height  of  the  tree).  An 
algorithm  to  minimize  the  size  of  a  multiplexer  tree  was  presented  in 
(Mange  78,  Cemy  79);  however,  it  necessitates  the  generation  of  all 
Impllcants  of  the  function  as  well  as  of  their  subsequent  combinations 
in  order  to  find  the  optimal  solution  and  thus  requires  exponential 
time.  The  same  idea  appeared  in  (Davio  77)  for  the  minimization  of  the 
height  of  the  tree.  Both  approaches  were  regrouped  and  generalized  to 
k-ary  functions  in  (Thayse  78) . 

Since  decision  trees  modeli  multistage  decision  processes,  they  have 
found  widespread  applications  in  sequential  pattern  recognition. 

Although  a  tree  is  often  synthesized  directly  from  the  problem  without 
attempt  to  optimize  its  cost  (Rounds  79,  You  76),  there  have  been  a  few 

studies  of  the  optimization  problem.  An  approach  using  game  tree 

searching  techniques  was  proposed  in  (Slagle  71) ,  who  noted  that  this 

approach  can  be  modified  to  yield  optimal  dynamic  programming  or 

branch-and-bound  algorithms.  The  heuristic  procedure  presented  in 
(Sethi  80)  for  decision  tables  had  first  been  designed  for  pattern 
recognition  purposes  (Sethi  77).  More  importantly,  a  dynamic  programming 
algorithm  similar  to  those  of  (Bayes  73,  Schumacher  76)  was  published 
by  (Meisel  73)  [and  later  refined  by  (Payne  77)].  This  algorithm  takes 
a  multivalued  function  as  input  and  produces  a  binary  decision  tree 
optimal  with  respect  to  expected  or  worst  case  testing  cost,  or  storage 
cost,  or  any  weighted  combination  of  those  costs. 

The  evaluation  of  Boolean  functions  using  decision  trees  with  an 
eye  toward  software  implementation  has  given  rise  to  several  optimal  and 
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heurlscic  algorithms.  A  selection  criteria  similar  to  that  used  in 
(Cemy  79)  was  proposed  in  (Halpem  74a,  b) ;  it  necessitates  the 
generation  of  all  prime  implicants  for  the  function  and  its  dual  (hence 
requiring  exponential  time),  implicants  which  are  then  ranked  in  terms  of 
their  probability  to  cost  ratio  (where  the  cost  of  an  implicant  is  that 
of  the  optimal  tree  for  it).  Variables  which  appear  in  both  the  best 
implicant  for  the  function  and  that  for  its  dual  are  then  selected.  The 
author  shows  that  this  strategy  is  optimal  for  symmetric  functions  (those 
that  remain  invariant  under  any  permutation  of  the  variables) ,  but  offers 
no  analysis  of  performance  in  the  general  case.  (Breitbart  75a)  presented 
a  heuristic  selection  rule  fcr  monotone  Boolean  functions  with  unity  costs 
and  uniform  probability  distribution,  which  requires  to  find  the  minimal 
sum-of-products  form  (known  to  be  unique  for  monotone  functions)  and  is 
thus  exponential  time;  in  a  later  analysis  (Breitbart  78),  it  was  shown 
that  trees  constructed  by  this  rule  can  have  an  expected  number  of  tests 
at  least  (n/log  n)  times  larger  than  the  optimal  trees  for  functions  of 
n  variables.  The  same  authors  also  adapted  the  work  of  (Reinwald  66)  to 
Boolean  functions  (Breitbart  75b);  their  article  contains  a  theorem 
relating  Chow  parameters  to  the  expected  testing  cost  of  a  decision  tree 
(which  will  be  further  examined  in  Chapter  5)  and  provides  most  of 
Reinwald  and  Soland’s  results  in  a  much  more  readable  form.  In  the 
special  case  where  the  function  is  composed  of  a  disjunction  of  disjoint 
functions,  (Perl  76)  showed  that  testing  all  functions  in  some  fixed 
order  is  no  worse  than  changing  the  order  of  testing  on  some  paths, 
thereby  generalizing  some  results  of  (Slagle  64). 


The  worst  case  number  of  tests  (the  height  of  a  tree)  indicates  a 
minimum  number  of  operations  that  must  be  performed  in  order  to  compute 
a  function;  as  such,  it  is  a  useful  technique  for  deriving  lower  bounds 
to  be  used  in  concrete  complexity  theory.  Of  particular  interest  is  the 
worst  case  behavior  of  Boolean  functions.  A  function  is  said  to  be 
exhaustive  if  every  tree  for  the  function  has  a  path  on  which  all 
variables  are  tested;  (Rivest  76a,  b)  proved  that  almost  all  (in  the 
sense  of  asymptotics)  Boolean  functions  are  exhaustive.  (The  proof 
will  be  presented  in  Section  4.3.3  along  with  some  new  results  on 
exhaustiveness. ) 

Lower  bounds  are  also  helpful  if  one  is  to  compare  decision  trees 
with  other  representations  of  functions.  In  a  fundamental  paper, 

(Lee  59)  introduced  binary  decision  diagrams  (which  he  called  binary 
decision  programs)  and  compared  them  to  Boolean  formulae  as  means  of 
representation  of  Boolean  functions.  Using  an  ingenious  proof  technique, 
he  was  able  to  show  that  no  Boolean  function  of  n  variables  needs  more 
than  4  •  7^ In  -  1  diagram  nodes  and  that  some  require  more  than 
1/2  •  Z^'/n  nodes.  (A  straightforward  extension  of  his  proof  shows  that 
the  number  of  diagram  nodes  needed  to  represent  any  k-ary  function  of 
n  variables  is  bounded  below  by  k^/kn  and  above  by  2  •  k'^/n,  for  k  >  2.) 
By  contrast,  it  is  well-known  (Savage  76)  that  a  Boolean  formula  may 
require  up  to  {?(2”/log  n)  operators  for  a  function  of  n  variables; 
moreover,  every  operation  must  be  carried  out  at  each  evaluation  of  the 
function,  so  chat  the  expected  number  of  operations  needed  to  evaluate 
a  Boolean  formula  is  (9(2'^/log  n)  while  a  decision  diagram  will  never 
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need  more  than  n  comparisons.  The  author  justly  concluded  that  binary 
decision  diagrams  should  be  further  investigated  as  efficient  repre¬ 
sentations  of  Boolean  functions.  Some  of  the  interesting  properties 
of  decision  diagrams  and  trees  were  independently  rediscovered  by 
(Prather  78)  (who  called  them  atomic  digraphs)  and  (Akers  78a,  b) , 
who  investigated  their  use  in  testing  switching  circuits.  (This  will 
be  further  pursued  in  Chapter  6.) 

Finally,  a  decision  tree  can  be  used  to  model  the  control  structure 
of  a  program  (Prather  78),  in  particular  in  relation  with  lanov's 
schemata  (lanov  60) .  It  then  becomes  important  to  recognize  identical 
structures,  that  is,  to  decide  whether  or  not  two  free  Boolean  graphs 
are  equivalent  (i.e.,  represent  the  same  function).  (Fortune  78)  showed 
that  this  problem  is  NP-complete;  however,  (Blum  80)  provided  a 
probablistlc  algorithm  chat  solves  the  problem  in  polynomial  time. 

3.5.  Summary 

Most  of  Che  results  available  about  decision  trees  as  representations 
of  discrete  functions  concern  algorithms  for  constructing  optimal  or 
subopcimal  trees.  Two  types  of  optimal  algorithms  have  been  published. 

The  first  uses  branch-and-bound  techniques  and  is  applicable  to  all 
optimality  criteria  for  which  some  kind  of  lower  bound  can  be  derived; 
the  second  uses  dynamic  programming  and  is  applicable  only  to  criteria 
Chat  are  "compatible"  with  decomposition  in  that  optimal  solutions  must 
have  optimal  subsolutions.  Numerous  subopcimal  algorithms  have  been 
proposed,  all  of  which  can  result  in  trees  that  are  arbitrarily  far  from 
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the  optimal.  Most  of  these  rules  fall  into  one  of  three  categories, 
namely,  the  splitting  strategy,  the  information  algorithm,  and  the 
local  minimization  (using  a  lower  bound) . 

The  problem  of  constructing  optimal  decision  trees  is  known  to  be 

NP-hard  (for  most  criteria)  in  the  case  of  binary  identification.  In 

the  general  case,  a  dynamic  programming  algorithm  is  available,  which 

2 

provides  an  efficient  solution  (less  than  (?(s  )  for  input  of  size  s). 

2 

The  three  main  heuristics  are  of  complexity  <9(s  *  log  s)  or  (?(s  ),  but 
several  others  have  exponential  complexity  due  to  the  incorporation  of 
an  NP-hard  problem  (such  as  minimum  cover  or  minimum  sum-of-products 
form) . 

Much  less  attention  has  been  devoted  to  finding  general 
characteristics  of  tree  and  diagram  representations  of  functions.  It 
is  known,  however,  that  almost  all  Boolean  functions  are  exhaustive, 
that  is,  have  tree  representations  of  maximal  height.  Further,  it  has 
been  shown  that  at  most  <?(2”/n)  nodes  are  needed  in  a  decision  diagram 
to  represent  any  Boolean  function  of  n  variables  (versus  /log  n) 
operators  in  a  Boolean  formula) ,  which  can  then  be  evaluated  in  <?(n) 
comparisons  (versus  0(2”/log  n)  operations  with  a  Boolean  formula). 

Most  of  the  work  done  has  been  in  relation  with  Boolean  functions 
and  it  must  be  noted  that  an  important  subset  of  the  results  do  not 
generalize  to  multivalued  functions. 


CHAPTER  4 


MEASURES  AND  OPTIMIZATION  PROBLEMS 


4.1.  Introduction 

The  use  of  decision  trees  as  models  of  discrete  functions  presents 
a  problem  of  uniqueness  of  representation,  since  a  given  function  has, 
in  general,  numerous  decision  tree  representations.  As  indicated  in 
Section  2.3,  several  criteria  have  been  proposed  to  select  a  standard 
(optimal)  representation;  however,  the  construction  of  this  standard 
tree  may  be  quite  a  complex  problem  for  several  criteria.  Moreover, 
there  is  a  fundamental  question  of  choice,  since  at  least  seven  criteria 
have  been  defined. 

The  complexity  of  the  optimization  problem  for  the  diverse  criteria 
and  types  of  functions  has  received  rather  limited  attention,  as 
mentioned  in  Chapter  3.  The  decision  problem  for  each  of  the  seven 
criteria  is  easily  seen  to  be  in  NP.  The  exact  complexity  of  the 
decision  and  optimization  problems  will  be  discussed  in  this  chapter. 

The  choice  of  a  criterion  has  rarely  been  discussed  in  the 
literature.  Most  researchers  used  the  expected  testing  cost  or  the 
storage  cost  as  corresponding  to  demands  on  time  and  memory,  respectively, 
in  software  implementations.  (Verhelst  72)  argued  that  the  storage  cost 
is  rather  unimportant,  but  did  not  further  justify  his  use  of  the 
expected  testing  cost.  The  question  was  avoided  in  several  articles 
(Yasul  71,  Payne  77)  by  proposing  to  find  a  tree  representation  that 
would  simultaneously  satisfy  a  few  criteria.  This  approach,  however,  is 
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inapplicable  since,  as  will  be  shown  in  the  following  sections,  almost 
all  the  proposed  criteria  are  incompatible  in  the  sense  that  they  cannot 
be  optimized  simultaneously.  This  chapter  investigates  the  relationships 
between  the  various  measures  and  discusses  the  merits  of  each;  it  shows 
that  only  one  criterion  satisfactorily  reflects  the  complexity  of 
decision  trees. 

4.2.  The  Case  of  Binary  Identification 

The  various  applicable  measures  will  be  first  examined  under  the 
assumption  of  unity  costs  and  uniform  distribution  of  objects.  Under 
those  conditions,  the  storage  cost  of  a  tree  reduces  to  its  number  of 
nodes  (which  is  fixed,  as  noted  in  Section  2.3.4),  and  its  expected 
testing  cost  is  equal  to  its  normalized  testing  cost,  of  which  the  path 
length  is  a  fixed  multiple;  hence,  only  four  criteria  are  applicable, 
namely,  the  height,  the  path  length,  and  the  minimum  reverse  and 
maximum  profiles. 

Proposition  4.1.  The  maximum  profile  is  incompatible  with  the 
other  three  measures. 

Proof ;  Consider  the  identification  problem  with  five  objects, 

{a,  b,  c,  d,  e},  and  four  tests,  {T^^  »  {a},  T2  =  (a,  b},  T^  “  (a,  b,  c}, 
T^  ■  {a,  b,  c,  d}}.  The  trees  with  maximum  profile  test  T^^  or  T^  first, 
while  those  optimal  with  respect  to  the  other  three  criteria  test  T^  or 


T^  first,  with  the  resulting  measures  listed  below. 
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first  test  leaf  profile 

T  or  T,  (0,  1.  1,  1.  2) 

or  (0,  0,  3,  2.  0) 


height  path  length 


It  is  further  noted  that  the  function  used  in  the  proof  also  has  trees 
rooted  in  or  with  a  leaf  profile  of  (0,  1,  0,  4,  0)  and  thus  a 
height  of  three  and  a  path  length  of  thirteen;  hence,  minimizing  the 
height  of  a  tree  does  not  result  in  the  optimization  of  any  other  measure. 
(On  the  other  hand,  it  is  obvious  that  minimizing  the  reverse  profile 
will  minimize  the  height.)  The  known  relationships  between  the  four 
measures  are  summarized  in  Figure  4.1,  where  the  empty  set  symbol,  (j), 
means  that  the  corresponding  measures  are  incompatible  (that  is,  that  the 
set  of  all  trees  optimal  under  one  criterion  has  no  intersection  with 
Chat  of  all  trees  optimal  under  the  other  criterion).  The  exact 
relationship  between  the  reverse  profile  and  the  path  length  is  unknown. 

It  is  easy  to  construct  an  example  showing  that  they  are  not  strictly 
equivalent,  in  the  sense  that  the  set  of  all  trees  for  the  example  does 
not  get  ranked  in  the  same  order  by  both  criteria;  however,  this  author 
was  not  able  to  construct  a  problem  where  the  optimal  trees  for  both 
criteria  did  not  coincide. 

The  introduction  of  nonuniform  probabilities  results  in  further 
incompatibilities  and  one  more  measure  (the  expected  testing  cost) ;  in 
fact,  the  only  two  measures  that  are  not  incompatible  are,  trivially, 
the  reverse  profile  and  the  height.  Storage  and  testing  costs  invalidate 
the  use  of  leaf  profiles,  but  give  rise  to  another  valid  measure  (the 
tree  storage  cost);  all  measures  are  then  pairwise  Incompatible. 
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Figure  4.1.  Known  relationships  between  the  four  measures 
applicable  to  binary  identification  trees. 


The  decision  problem  for  the  path  length  measure  was  shown  to  be 
NP-complete  in  (Hyafil  76),  even  with  unity  costs  and  uniform  distribution. 
Their  construction  [a  straightforward  reduction  from  the  exact  cover  by 
three  sets — see  (Garey  79,  p.  53)]  can  be  used  to  show  that  the  decision 
problems  for  the  reverse  profile  and  the  expected  and  worst  case  testing 
costs  are  also  NP-complete.  Using  standard  completion  and  search 
techniques  (as  developed  in  Section  2.2.3),  it  is  an  easy  matter  to  show 
that  the  optimization  problems  for  the  storage  cost  and  the  total  (and 
thus  also  normalized),  expected,  and  worst  case  testing  costs  are  all 
NP-equi valent.  The  optimization  problem  for  the  reverse  profile  is  not 
known  to  be  NP-easy:  although  one  can  apply  the  completion  technique, 
a  binary  search  (required  since  the  number  of  distinct  profiles  grows 
exponentially  with  the  number  of  leaves)  would  necessitate  a  polynomial 
time  ranking  algorithm  for  leaf  profiles,  which  is  as  yet  undiscovered. 
Table  4.1  summarizes  the  known  results  about  the  complexity  of  decision 
tree  optimization  in  binary  identification  problems. 
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Table  -.I 

The  Conplexity  of  Optimal  Binary 
Identification  Trees 


Criterion 

a  . 
nun 

.  •  H  . 
nun  nun 

E  . 
min 

h  . 
min 

rev . 
prof . 

max . 
prof . 

Complexity 

NP- 

equivalent 

NP- 

equivalent 

NP- 

equivalent 

N’P- 

equivalent 

MP- 

hard 

— 

4.3.  The  Case  of  Completely  Specified  Boolean  Functions 

4.3.1.  Relationships  between  measures:  conjectures  and  counterexamples 
The  relationships  betwen  the  various  measures  will  be  examined  in 
the  simplest  (and  least  conducive  to  incompatibilities)  case,  where  all 
costs  are  unity  and  the  probability  distribution  on  the  input  combina¬ 
tions  is  uniform.  All  of  the  eight  measures  defined  in  Section  2.3.1 
are  then  applicable. 

Proposition  4.2.  The  maximum  profile  is  incompatible  with  any 
other  measure. 

Proof :  Two  counterexamples  will  be  used.  First,  let  f^  be  the 
Boolean  function  of  four  variables  given  by  the  formula: 

fa(Xi,  X2,  x^,  x^)  =  x^x^  +  ’‘2’'3^4  * 

The  trees  with  maximum  profile  use  as  first  test  either  x^^  or  x^  and 
also  have  minimal  expected  testing  cost;  the  optimal  tree  for  all  other 
measures,  however,  tests  x^  first  and  is  unique  (except  for  the  minimum 
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diagram  storage  cost,  which  can  also  be  attained  by  testing  or 
first,  but  with  a  structure  different  from  that  of  the  maximum  profile 


trees) . 

The 

various  measures 

for  the 

three 

types  of 

trees 

are  listed 

below: 

first  1 

test 

profile 

a  . 
min 

6  . 
mm 

n  . 

mm 

H  . 
min 

h  . 
mm 

E  . 
mm 

X4 

(0,  0,  0,  8,  0) 

7 

6 

24 

3 

3 

3 

X 

(0,  0,  1,  4,  4) 

8 

6 

30 

3.3 

4 

3 

or 

^3 

(0,  0,  2,  2,  4) 

7 

7 

26 

3.25 

4 

2.75 

Hence,  the  maximum  profile  (and,  incidentally,  the  minimum  expected 
testing  cost)  is  incompatible  with  the  minimum  reverse  profile,  the 
diagram  storage  cost,  and  the  total,  normalized,  and  worst  case  testing 
costs.  Secondly,  let  fj^  be  the  Boolean  function  of  five  variables  given 
by  the  formula: 

^b^^l’  ^2’  ^3’  ’‘a’  ^  ^I’^S  ^3’‘a^ 

+  (X2  +  x^)  •  (x^x^  +  x^x^)  . 


The  trees  with  the  maximum  profile  test  x^  first,  while  those  optimal 
with  respect  to  all  other  measures  first  test  with  the  following 
results: 


first  test 


X 

X 


1 

5 


profile 

a  . 
mm 

®min 

^min 

min 

min 

E  . 
mm 

(0, 

0,  1,  1,  8,  4) 

13 

8 

57 

4.07 

5 

3.5 

(0. 

0,  1,  2,  2,  12) 

16 

9 

76 

4.47 

5 

3.625 
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Hence  the  maximum  profile  is  also  incompatible  with  the  minimum  tree 
storage  and  expected  testing  costs.  I  I 

The  first  function  also  shows  that  minimizing  the  tree  or  diagram 
storage  costs  does  not  optimize  any  other  criterion,  while  the  second 
demonstrates  the  same  result  about  the  minimum  worst  case  testing  cost. 

Proposition  4.3.  The  normalized  testing  cost  is  incompatible  with 
any  other  measure  except,  possibly,  the  worst  case  testing  cost;  more¬ 
over,  minimizing  the  normalized  testing  cost  may  involve  the  introduction 
of  redundant  tests. 

Proof ;  Let  f^  be  the  Boolean  function  of  five  variables  given  by 
the  formula: 

f^(x^,  X2,  x^i  Xj)  =  x^X2  +  X2  ©  ©  x^  ©  x^  , 

where  ©  stands  for  summation  modulo  2.  The  optimal  trees  for  all  measures 
but  H  test  x^  or  X2  first  and  use  no  redundant  test,  while  the  trees  with 
minimum  normalized  testing  cost  may  test  any  variable  first  and,  in  case 
Xj^  or  X2  is  chosen,  use  redundant  tests.  Two  diagrams  rooted  in  x^^  are 
shown  in  Figure  4.2,  the  left  being  optimal  with  respect  to  all  criteria 
but  H,  for  which  the  right  diagram  is  optimal;  the  corresponding  measures 
are  listed  below: 


tree 

profile 

min 

6  . 
min 

n  . 
mm 

H  . 
min 

h  . 
min 

^min 

left 

(0, 

0,  2,  0,  0,  16) 

17 

8 

84 

4.6 

5 

3.5 

right 

(0, 

0,  0,  4,  0,  16) 

19 

9 

92 

4.6 

5 

4 
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(a)  (b) 

Figure  4.2.  The  two  diagrams  for  the  counterexample  of 
Proposition  4.3. 

It  is  noted  that  the  test  of  as  the  right  child  of  the  root  is 
redundant. 

The  known  relationships  between  measures  are  summarized  in 
Figure  4.3  (which  incorporates  a  few  results  from  unmentioned 
counterexamples) .  These  results  disprove  several  conjectures  or 
assertions  found  in  the  literature.  In  particular,  (Vasui  71) 
claimed  that  the  number  of  nodes  of  a  tree  is  a  special  case  of 
the  expected  testing  cost;  this  was  reiterated  as  a  conjecture  in 
(Breitbart  75a) .  That  it  is  false  can  easily  be  seen  by  examining 
the  trees  for  the  Boolean  function  of  four  variables  given  by  the 
formula: 

f(Xj^,  X2,  Xj,  x^)  -  x^X2  +  Xj^X^  +  x^x^x^  . 
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Figure  4.3,.  Known  relationships  between  the  eight  measures 
applicable  to  binary  decision  trees. 
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Entries  left  blank  in  the  diagram  of  Figure  4.3  stand  for 

relationships  that  could  neither  be  proved  nor  disproved.  It  is 

conjectured  that  these  relationships,  most  particularly  the  implication 

(n  .  ==>  a  .  hold  true.  As  for  identification  problems,  the 

min  min 

introduction  of  nonuniform  probability  distributions  or  nonunity  costs 
renders  all  measures  pair  wise  Incompatible. 

4.3.2.  Uninteresting  measures 

From  the  preceding  results,  it  clearly  appears  that  the  normalized 
testing  cost  is  unsuitable  as  a  measure  of  complexity  on  decision  trees. 
Also,  since  the  notion  of  leaf  profile  cannot  be  easily  extended  to  take 
variables  of  different  costs  into  account,  both  the  maximum  and  the 
minimum  reverse  profiles  are  unsuitable.  Finally,  the  tree  storage  cost 
is  not  an  accurate  measure  of  the  true  hardware  (or  memory)  requirements, 
since  the  diagram  storage  cost  is  never  larger  and  often  much  smaller. 

(It  is  recalled  that  (^2^)  tree  nodes  are  needed  in  order  to  represent 
any  Boolean  function  of  n  variables,  while  <?(2”/n)  diagram  nodes  suffice.) 
Thus,  the  tree  storage  cost  is  rejected  in  favor  of  the  diagram  storage 
cost. 

Of  Che  four  remaining  measures,  three  (n,  h,  E)  are  related  to 
problems  of  tree  usage  and  one  (6)  to  problems  of  implementation. 

While  the  diagram  storage  cost  certainly  is  the  relevant  measure  of 
implementation  problems  (except,  perhaps,  for  the  difficulty  of  its 
optimization — not  known  to  be  polynomially  feasible,  whereas  the  tree 
storage  cost  can  be  efficiently  optimized),  it  appears  as  less 
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characteristic  of  the  complexity  of  a  function  than  measures  of  testing 
costs.  In  particular,  the  storage  cost  is  not  as  strikingly  different 
for  decision  trees  (as  compared  with  other  models  of  functions)  than  the 
testing  costs;  for  instance,  although  it  is  an  order  of  magnitude  better 
than  Boolean  formulae  for  logic  functions — (?(2^/n)  versus  0(2”/log  n) , 
this  remains  small  compared  to  the  ratio  for  the  expected  testing  cost — 
(J(n)  versus  (?(2''/log  n) .  Thus,  the  storage  cost  does  not  capture  an 
essentially  new  aspect  of  functions,  while  the  expected  testing  cost 
definitely  does. 

Finally,  of  the  three  measures  of  testing  cost,  only  two  (h  and  E) 
are  concerned  with  the  performance  of  a  tree  representation.  The  total 
testing  cost  does  not  make  use  of  the  probability  distribution,  yet 
neither  does  it  measure  a  performance  extreme  (as  the  worst  case  testing 
cost) .  Although  it  is  of  interest  in  the  case  of  binary  identification 
(since  it  is  then  a  measure  of  the  cost  incurred  in  identifying  one 
object  in  each  category,  that  is,  in  producing  each  output  of  the  func¬ 
tion  exactly  once),  it  does  not,  in  general,  correspond  to  practical 
concerns  that  an  engineer  or  designer  might  have  about  a  function. 

This  leaves  two  measures  of  special  interest,  namely,  the  expected 
and  worst  case  testing  costs,  which  are  discussed  in  turn  in  the  next 
sections . 

4.3.3.  The  worst  case  testing  cost 

As  mentioned  in  Section  3.4,  (Rivest  76a,  b)  proved  that  almost 
all  Boolean  functions  are  exhaustive,  chat  is,  have  a  maximal  worst  case 
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testing  cost.  In  this  section,  his  argument  will  be  reproduced  and 
further  results  presented.  The  worst  case  testing  cost  will  then  be 
discussed  in  the  light  of  those  results. 

Let  f  be  a  Boolean  function  of  a  variable  and  let  denote  the 
set  of  all  minterms  of  f,  that  is 

...,  xj  1  f(x^,  ....  x^)  -  1}  . 

Let  the  weight  of  an  input  vector,  w(Xj^,  ....  x^) ,  be  the  sum  of  the 

components  of  the  vector  (i.e.,  the  number  of  components  that  are  equal 

to  one).  In  a  decision  tree  representation  of  f,  a  leaf  labelled  1  at 
n**lc 

depth  k  represents  2  minterms  (since  n  -  k  variables  are  unspecified); 
in  particular,  if  the  tree  has  height  h,  then  2"^  ^  divides  lifi- 
Consider  the  generating  function 

g  (z)  -  I  a  z^  ,  (4.3-1) 

^  i-0 

where  a^  is  the  number  of  minterms  of  weight  i.  Then 

g-(l)  -  I  a.  -  |JC  1  , 

^  i-0  ^ 

I  n-k 

A  leaf  at  depth  k  contributes  a  summand  z  (1  +  z)  to  g^(z),  where  i 
is  the  weight  of  the  k-subvector.  (Since  the  remaining  n-k  components 
are  unspecified,  each  can  take  the  value  0  or  1,  corresponding  to  the  1 
and  the  z  in  the  term  (1  +  z)"  Thus  a  tree  of  height  h  has  each  leaf 

labelled  with  a  1  contributing  a  multiple  of  (1  +  z)" 
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Lemma  4.1.  If  the  worst  case  number  of  tests  for  t  function  f 
is  h,  then  (1  +  z)''  ^  divides  g^(z).  3 

As  a  special  case,  setting  z  >  1  gives  the  result  stated  above,  that  is, 
2^  divides  Now,  letting  z  =  -1  yields  g^(-l)  =  0  for  h  <  n,  so 

that  the  sum  of  the  odd  power  terms  is  equal  to  that  of  the  even  power 
terms. 


Lemma  4.2.  Any  Boolean  function  which  is  not  exhaustive  has  an 
equal  number  of  odd  and  even  weight  minterms. 

But  now,  the  number  of  Boolean  functions  of  n  variables  with  an  equal 
number  of  odd  and  even  weight  minterms  is 


/<»)  -  'T  ■  ( ti)  ■ 

j=0  ^  'on  J./ 


(4.3-2) 


2"  / - n^ 

which,  for  large  n,  becomes  approximately  2^  /A  •  2"  (Knuth  73,  p.  71) 
Since  the  number  of  nonexhaustive  Boolean  functions  must  be  less  than 
«i^n)  by  Lemma  4.2,  and  since  there  are  2^  Boolean  functions  of  n 
variables,  the  fraction  of  all  Boolean  fractions  that  are  nonexhaustive 
must  be  less  thanc^n)/2^  .  But,  for  large  n: 


lim 

n-xjo 


Xm/2^  -  lim  (2^  /A  •  2""^)/2^  »  lim  1/A  •  2''"^  -  0 


n-»co 


n-Mo 


(4.3-3) 


which  proves  the  following. 
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Theorem  A ■ 1.  Almost  all  Boolean  functions  are  exhaustive.  u 

The  above  theorem  is  the  announced  result  from  (Rives t  74) . 

It  is  natural  to  suspect  that  there  exist  large  groups  of  exhaustive 
functions.  It  will  now  be  shown  that  symmetric  and  threshold  functions 
are  exhaustive. 

A  Boolean  function  of  n  variables,  f(x, ,  ...,  x  ),  is  said  to  be 

I  n 

symmetric  if  and  only  if,  for  each  permutation,  o,  over  n  letters: 


Equivalently,  a  function  is  symmetric  if  and  only  if  there  exists  a  set 
of  k  numbers  (k  ^  n) ,  { a^,  . . . ,  a^^}  where  0^aj^<  ...<  Sj^^n,  such 
that  the  function  is  equal  to  one  exactly  when  a^  of  its  variables  are 
equal  to  one,  for  some  i,  1  1,  i  Such  a  function  has  a  single  tree 

Structure,  since  testing  one  variable  rather  than  another  does  not  change 
the  resulting  subfunctions;  in  particular,  after  testing  n  -  a^^  variables 
and  finding  them  all  equal  to  zero,  the  remaining  a^  variables  must  all 
be  tested,  since  the  function  will  be  equal  to  one  if  all  are  equal  to 
one.  This  proves  the  following  result. 


Theorem  4.2.  All  symmetric  Boolean  functions  are  exhaustive 


.  □ 


Let  P  be  the  defining  property  of  a  class  of  functions  such  that, 
if  f  possesses  P,  then  both  f^^  and  f^^  possess  P,  for  any 
choice  of  x^;  in  other  words,  P  is  preserved  by  Shannon's  decomposition. 
All  functions  in  the  class  are  then  exhaustive  if  and  only  if  they  have 
no  redundant  variables  and,  in  any  Shannon's  decomposition,  at  least 
one  of  the  two  subfunctions  has  no  redundant  variables. 


'0 


A  class  of  functions  of  considerable  interest  is  that  of  unate 


functions,  that  is,  functions  for  which  a  Boolean  formula  can  be 
written  which  uses  no  variable  in  both  complemented  and  uncomplemented 
form.  Since  decision  trees  are  invariant  under  complementation  of 
variables,  it  can  be  assumed  without  loss  of  generality  that  all 
variables  are  used  uncomplemented;  this  defines  the  class  of  positive 
unate  functions,  which  can  be  shown  to  be  monotone  increasing 
(Harrison  65).  Both  properties  are  clearly  preserved  by  Shannon's 
decomposition. 


Let  then  '**’  ^n^  intrinsic  positive  unate  function; 


from  the  above,  f  is  exhaustive  if  and  only  if,  for  each  choice  of  x^. 


either  intrinsic,  that  is,  there  cannot  be  found 


Xj ,  Xj^  (j ,  k  i)  such  that  .g)  depend  on  x^  and 

does  not  depend  on  Xj^.  Without  loss  of  generality,  let  i  =  1,  j  =  2, 


and  k  =  3,  and  let  x  stand  for  (x, ,  ...,  x  ).  Then  f  is  not  exhaustive 
—  4’  ’  n 


if  and  only  if 


f(0,  0,  X-,  x)  *  f(0,  1,  X.,  x) 


and 


f(l,  x^,  0,  x)  ■  *2’  ■  (4.3-4) 

Since  £  is  monotone  increasing,  it  must  be  the  case  that 
f(l,  x^,  1,  x)  0»  *3*  i)  » 
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so  that,  by  topological  sorting,  the  following  relations  are  obtained: 

f(l,  1,  1,  x)  =  f(l,  1,  0,  x)  >  f(l,  0,  1,  x)  =  f(l,  0,  0,  x) 

>  f(0,  1,  1,  x)  =  f(0,  0,  1,  x) 

>  f(0,  1,  0,  x)  =  £(0,  0,  0,  x)  .  (4.3-5) 

Let  the  four  pairs  of  function  points  in  (4.3-5)  be  denoted  a,  b,  c,  and 
d  in  this  order.  For  any  given  choice  of  x,  these  four  pairs  can  be 
mapped  to  the  same  value  or  to  two  distinct  values  (0  and  1) ,  with  the 
following  partitions: 

(1)  (abed)  mapped  to  the  same  value;  then  x^^,  X2«  and  x^  are  redundant 
for  that  choice  of  x; 

(2)  (abc)  mapped  to  1  and  (d)  to  0;  then  X2  is  redundant  for  that  choice 
of  jc; 

(3)  (ab)  mapped  to  1  and  (cd)  to  0;  then  X2  and  x^  are  redundant  for  that 
choice  of  x; 

(4)  (a)  mapped  to  1  and  (bed)  to  0;  then  x^  is  redundant  for  that  choice 
of  X. 

No  other  choice  is  possible  due  to  the  monotone  property.  This  shows 
that  all  unate  functions  of  three  or  less  variables  are  exhaustive, 
since  then  there  is  no  choice  for  x  and  one  of  (l)-(4)  must  hold, 
contradicting  the  assumption  of  Intrinsicalness .  At  the  same  time, 
however,  it  shows  how  to  construct  a  nonexhaustive  unate  function  of  four 
variables:  it  is  enough  to  find  jc'  >  x"  such  that  f(x^,  X2,  x^,  x  ) 
partitioned  according  to  (2)  and  f(Xj^,  X2,  x^,  according  to  (4), 
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since  then  X2  is  redundant  in  one  case  and  in  the  other,  but  both  are 
necessary  overall.  One  such  function  is  given  by  the  formula 


f(Xj^,  X2,  x^,  x^)  =  Xj^X^  +  x^x^  +  x^x^ 


and  it  is  easily  verified  that  it  has  tree  representations  of  height  3. 

Thus,  not  all  unate  functions  are  exhaustive,  but  what  of  a  more 
restricted  class?  A  special  case  of  unate  functions  is  that  of  threshold 
(or  linearly  separable)  functions,  where  a  Boolean  function  of  n 
variables,  f(Xj^,  ...,  •  is  a  threshold  function  if  and  only  if  there 

exist  a  set  of  weights,  (w^,  ....  w^) ,  and  a  threshold,  T,  such  that 
f(x^,  ...,  x^)  is  one  exactly  when 


I  i  T 

i=0  ^  ^ 


Since  unate  functions  can  be  taken  to  be  positive,  weights  and  threshold 
are  assumed  positive  without  loss  of  generality.  Let  now  w  stand  for 
(w^f  •••»  w^);  substituting  weights  and  threshold  into  the  four  pairs 
of  (4.3-5)  yields: 


a  -  w^  +  w^  +  w*x  ;  Wj^  +  w^  +  w^  +  ; 

b  ■  (Wj^  +  w*x_*^ ;  w^  +  w^  +  w'2L^) ; 


c  ■  (w  +  w*x_^ ;  w„  +  w»  +  i*x^) ; 


2  2L  u.  »  "'2  3 

d  -  (^i’x.^;  w_  +  w'x*")  , 
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where  jc^  stands  for  the  transpose  of  jc.  As  seen  above,  a  function  will 
not  be  exhaustive  only  if  x'  >  x"  can  be  found  such  that  f(x^,  x^,  x^,  x') 
is  partitioned  as  (abc)(d)  and  f(Xj^,  X2,  x^,  x")  as  (a)  (bed) .  Using 
only  boundary  values  around  the  threshold,  this  implies,  for  the  first 
partition: 

w^  +  W2  +  w*x^^  i  +  w^  +  <  T,  W2  +  w^  +  w*x'^  <  T  , 

and  for  the  second: 

w^  +  v‘x"^  i  '^1  i  ”0  ^  T  • 

The  first  three  inequalities  yield  W2  >  w^  and  the  second  three  yield 
W2  <  w^,  a  contradiction.  Hence  and  x"  cannot  be  found,  which  proves 
the  following. 

Theorem  A. 3.  All  linearly  separable  functions  are  exhaustive.  L_l 

The  above  results  show  that  the  worst  case  testing  cost  does  not 
discriminate  between  most  Boolean  factions.  Thus,  although  it  can  be 
efficiently  minimized  [the  dynamic  programming  algorithm  of  (Bayes  73) 
can  be  applied],  the  worst  case  testing  cost  must  be  rejected  as  a 
measure  of  the  complexity  of  decision  trees. 

4.3.4.  The  expected  testing  cost 
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Given  an  intrinsic  Boolean  function  of  n  variables,  f(x^,  ....  > 

where  testing  variable  incurs  cost  c^,  and  given  a  probability 
distribution,  p,  on  the  input  combinations,  the  expected  testing  cost 
of  any  tree  representation,  T,  of  f  is  obviously  bounded  by: 

n 

min  {c.  I  i=l,  ...,n}<  E(T)  ^  I  c.  .  (4. 3-6) 

^  i=l 

In  particular,  if  p  is  uniform  and  all  the  costs  unity,  then: 

n/2"  +  I  i/2^  =  2  -  ^  E(T)  ^  n  . 

i=l 

In  fact.  Boolean  functions  with  unity  costs  and  uniform  probability 
distributions  require  an  expected  nvanber  of  tests  that  converges  to  n; 
this  can  be  shown  as  follows.  Let  F(n)  be  the  number  of  Boolean  functions 
of  n  variables  and  let  I(n)  be  the  number  of  those  that  are  intrinsic; 
thus: 

2n 

F(n)  =  2^ 

and 

I(n)  -  I  (-1)"'^  •  (")  •  F(i)  . 
i-0 

As  remarked  in  Section  2.2.1,  almost  all  Boolean  functions  are  intrinsic. 


that  is: 
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lim  I(n)/F(n)  =  1  , 

n-«o 

with  rapid  convergence.  Now,  the  expected  value  of  E  for  a  function  of 
n  variables,  E(n),  must  be  at  least  as  large  as  E(n  -  1)  for  nonintrinsic 
functions  and  equal  to  1  +  E(n  -  1)  otherwise;  hence  the  recurrence: 

E(n)  F(n)  -  I(n)  •  E(n  -  1)  +  I(n)  •  1  +  E(n  -  1)  ]/F(n) 

=  E(n  -  1)  +  I(n)/F(n)  .  (4.3-7) 

Since  I(n)/F(n)  rapidly  converges  to  1  (for  n  =  4,  it  is  already  within 
two  percent) ,  the  expected  value  of  E  is  essentially  equal  to  n  for 
large  values  of  n. 

The  above  result,  however,  does  not  indicate  that  minimizing  the 
expected  testing  cost  is  useless,  since  the  presence  of  nonuniform  costs 
and  probabilities  results  in  the  wide  range  of  values  described  by 
(4.3-6).  Moreover,  as  seen  in  Section  3.3,  this  minimization  can  be 
accomplished  very  efficiently  by  dynamic  programming — in  time 
<9(s^°®2^  .  log  s)  for  an  input  of  size  s  =  (?(2") .  This  result  has 
often  been  misinterpreted,  most  recently  in  (Standish  80,  p.  176),  by 
taking  n  as  the  input  size  and  declaring  the  algorithm  to  be  exponential 
time  since  it  requires  (?(n  •  3^  operations.  An  exception  is  the 
case  where  the  probability  distribution  is  only  incompletely  specified 
and  the  function  itself  given  by  its  minterras  or  by  a  simplified  formula 
(as  in  decision  tables  where  probabilities  are  specified  only  for — not 
necessarily  simple — rules),  resulting  in  an  input,  the  size  of  which  may 
be  polynomial  in  n.  In  that  case,  however,  there  is  no  properly  optimal 
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tree  (because  the  expected  testing  cost  cannot  be  computed  exactly,  but 
can  only  be  bounded)  unless  the  combinations  grouped  in  a  single 
probability  assignment  are  assumed  equally  likely,  in  which  case  it  is 
conjectured  that  the  optimization  problem  is  NP-hard  (and  thus 
NP-equivalent ,  since  it  is  clearly  NP-easy) . 

A. 4,  The  General  Case 

In  the  general  case,  the  function  given  may  be  partial  only.  Then, 
even  in  the  case  of  Boolean  functions  with  unity  costs  and  uniform 
distributions,  no  two  measures  are  compatible  (except,  possibly,  the 
external  path  length  and  the  number  of  nodes;  the  implication 
^min  '^min  neither  be  proved  nor  disproved,  although  it  is 

clearly  false  with  arbitrary  costs  or  probability  distributions) .  The 
reasons  enounced  in  the  previous  section  for  choosing  testing  costs 
rather  than  storage  costs  as  measures  of  the  complexity  of  decision 
trees  remain  valid,  as  does  the  selection  of  the  expected  testing  cost 
over  the  other  measures  of  testing  cost. 

The  dynamic  programming  algorithm  for  the  minimization  of  the 
expected  testing  cost  is  applicable  to  the  general  case,  so  that  the 
complexity  of  optimizing  E  is  polynomial  when  the  probability  distribution 
is  completely  specified,  and  probably  NP-equivalent  when  inputs  are 
assumed  to  be  equally  likely  and  the  function  is  partial.  A  further 
study  of  the  expected  testing  cost  will  be  presented  in  the  next  chapter. 


CHAPTER  5 


ACTIVITY  OF  A  VARIABLE 


5.1.  Introduction 

In  the  previous  chapter,  the  expected  testing  cost  was  chosen  as  the 
measure  of  complexity  for  decision  trees  and  diagrams.  This  does  not 
imply  that  the  minimum  expected  testing  cost  for  a  function  should  be 
chosen  as  a  measure  of  that  function's  complexity;  for  instance,  such 
a  measure  would  not  be  implementation-independent.  It  does  mean, 
however,  that  the  relationship  between  a  function  and  the  expected  testing 
costs  of  its  decision  tree  representations  must  be  investigated.  In 
particular,  as  pointed  out  in  Chapter  1,  a  characterization  of  the 
influence  of  individual  variables  on  the  measure  is  desirable. 

The  following  sections  develop  such  a  characterization,  the 
activity  of  a  variable,  and  show  its  relation  to  decision  trees.  The 
concept  is  then  extended  to  hierarchies  of  recursive  relations  and  its 
application  to  conventional  problems  (such  as  the  heuristic  construction 
of  suboptimal  trees)  and  to  the  development  of  a  measure  of  function 
complexity  is  investigated. 

5.2.  Definition  and  Results 

The  contribution  of  any  variable  to  the  expected  testing  cost  of  a 
tree  varies  between  zero  and  that  variable's  testing  cost.  Clearly,  a 
redundant  variable,  although  it  may  be  used  in  a  tree,  is  not  a  priori 
expected  to  make  any  contribution  since  it  need  not  be  tested. 
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Conversely,  an  indispensable  variable,  that  is,  one  which,  regardless  of 
the  values  of  the  other  variables,  must  be  tested  in  order  to  determine 
the  value  of  the  function,  is  expected  to  contribute  its  entire  testing 
cost.  Hence,  an  a  priori  measure  of  the  influence  of  a  variable  on  the 
expected  testing  cost  of  any  tree  representation  must  vary  between  zero — 
for  a  redundant  variable — and  that  variable's  testing  cost — for  an 
indispensable  variable. 

Moreover,  such  a  measure  should  be  compatible  with  the  decomposition 
process  characteristic  of  decision  trees.  That  is,  the  measure  on  the 
whole  function  should  be  related  to  that  of  the  subfunctions  in  much 
the  same  way  as  the  expected  testing  cost  is.  This  leads  to  the 
following  definition. 

Definition  5.1.  Given  a  (partial)  function  of  n  variables, 
f(x^,  ...,  x^) ,  where  each  variable,  x^,  takes  on  m^  values  and  has  an 
associated  testing  cost  of  c^,  and  where  the  probability  of  an  input 
vector,  (x^,  ...,  x^) ,  is  denoted  p(x^,  ...,  x^) ,  the  activity. 
af(Xi),  of  variable  x^  with  respect  to  the  function  f  is  defined  by  the 
two  relations: 

(a)  0  £  1  ; 

moreover,  ■  0  if  and  only  if  x^  is  redundant,  and  a^(x^)  - 

if  and  only  if  x^  is  indispensable; 

(b)  for  any  x ^ ,  j  y  i  , 
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a  (x  )  =  I  p(x  =  k)  -a,  (x  )  .  Q 

^  ^  k=0  ^  (x^=k)  ^ 

The  second  relacion  requires  chat  the  activity  of  a  variable  with 
respect  to  a  function  be  equal  to  the  expected  value  of  Che  activities 
of  that  variable  with  respect  to  the  subfunctions  resulting  from  a 
decomposition. 

Exactly  one  function  can  satisfy  Definition  5.1.  This  can  be  seen 

by  examining  the  case  of  functions  of  two  variables.  Let  f(x^,  x^)  be 

such  a  function  and  let  it  be  decomposed  around  X2.  The  resulting 

subfunctions,  £,  ,v  for  k  =  0,  ...,  m,  -  1,  are  functions  of  a 
(x^^k)  2 

single  variable,  x^*  Consequently,  that  variable  is  either  redundant  or 
indispensable  and  thus,  by  Definition  5.1(a),  its  activity  is  either  null 
or  equal  to  its  cost.  Hence,  by  Definition  5.1(b),  the  activity  of  x^ 
with  respect  to  f  is 

af(xi)  «  [  p(x2  =  k)  •  Cj^  =  Cj^  •  [  p(x2  =  k) 

k  k 


where  the  sum  is  taken  over  all  k  such  that  f(Xj^,  k)  depends  on  x^^.  A 
straightforward  induction  argument  then  completes  the  proof  of  the 
following  result. 


Theorem  5.1.  Exactly  one  function  satisfies  the  definition  of 
activity: 

•  2  p(x,  -  k^.  .... 


«  •  •  I 


X  ■ 

n 
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where  the  sum  is  taken  over  all  values,  k^,  *''n’ 

such  that,  for  some  pair  of  values  (v^,  v^)  of  x^,  f(k^ . *''1-1*  ''i’ 

'^i+l*  ■■■’  *^1-1’  ^'2’  ^i+1 . specified 

and  different. 

The  quantity,  p^(x^),  will  be  called  the  probability  that  f  is  sensitized 
to  x^;  it  is  the  a  priori  probability  that  will  be  needed  in  testing 
for  the  values  of  the  function.  Conversely,  the  a  priori  probability 
that  x^  will  be  useless,  1  -  p^(x^) ,  will  be  denoted  p^(x^). 

As  defined  above,  Che  activity  is  a  generalization  of  a  concept 
developed  from  Boolean  calculus  ■tn  (Bozoyan  75).  When  the  function  is 
a  completely  specified  Boolean  function,  the  activity  can  be  written  as 

af  (x^)  -  •  p  =.  ij  , 

where  9f/9x^  is  the  Boolean  difference  of  f  with  respect  to  x^.  [For 
details  on  the  Boolean  difference,  see  (Thayse  73).]  If  it  is  further 
assumed  that  Che  probability  distribution  is  uniform,  then  the  activity 
can  be  expressed  as 

af(x^)  ■  ♦  ?^(x^)/2"  , 

where  denotes  the  Chow  parameter  of  x^  with  respect  to  f 

(Winder  71) . 

Since  Che  activity  of  a  variable  is  an  a  priori  measure  of  its 
contribution  to  the  expected  testing  cost,  the  difference  between  Che 
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actual  contribution  and  the  activity  is  a  measure  of  the  loss  incurred 
by  testing  the  variable.  In  particular,  if  the  variable  is  tested  at 
the  root,  its  actual  contribution  is  equal  to  its  testing  cost,  so  that 
the  loss  incurred  becomes 

“  *^1  "  “^i  *  Pf^*i^  “  “^i  *  Pf^^i^  ■ 

Example  5.1.  Let  f  be  the  partial  function  of  Example  2.10,  given 
by 

f:  (0,  0,  0)  -  a  (1,  0,  0)  -  a  (2,  0,  1)  b 

(0,  0,  1)  -*•  a  (1,  0,  1)  •>  b  (2,  1,  0)  a 

(0,  1,  1)  -*■  c  (1,  1,  0)  -»■  c  (2,  1,  1)  ^  b 

p:  (0,  0,  0)  0.00  (1,  0,  0)  -*■  0.10  (2,  0,  0)  0.00 

(0,  0,  1)  0.05  (1,  0,  1)  0.05  (2,  0,  1)  ^  0.20 

(0,  1,  0)  -►  0.00  (1,  1,  0)  ->■  0.10  (2,  1,  0)  0.10 

(0,  1,  1)  0.10  (1,  1,  1)  0.15  (2,  1,  1)  -  0.15 

and  the  testing  costs,  c^^  -  5,  C2  “  3,  c^  -  2.  The  probability  of  f 
being  sensitized  is  computed  for  each  variable  as  follows: 

PjCxi)  -  p(x2  -  0,  x^  -  1)  +  ?(X2  »  1,  x^  -  0)  +  p(x2  -  1,  x^  -  1) 

-  (0.05  +  0.05  +  0.20)  +  (0.00  +  0.10  +  0.10) 

+  (0.10  +  0.15  +  0.15)  -  0.90  ; 

p^(x2)  -  p(Xj^  -  0,  x^  -  1)  +  p(Xj^  »  1,  x^  -  0) 


(0.05  +  0.10)  +  (0.10  +  0.10)  -  0.35  ; 
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p^Cx^)  “  p(Xj^  =  1,  X2  =  0)  +  p(x^  =  2,  X2  =  1) 

=  (0.10  +  0.05)  +  (0.10  +  0.15)  =  0.40  . 
Then  the  other  quantities  defined  above  are: 


p~(Xi)  »  0.10,  P£(x2)  =  0-65,  p^(x2)  =  0.60  ; 

a^(x^)  =  5  •  0.90  =  4.50,  a^(x2)  =  3  •  0.35  =  1.05,  a^(x2)  =  2  •  0.40 
»  0.80  ; 

i^(x^)  =  0.50,  il^(x2)  =  1.95,  ^^(x^)  =  1.20  .  D 

The  concepts  of  activity  and  loss  are  closely  related  to  the 
expected  testing  cost.  As  observed  above,  each  variable  makes  a  minimal 
contribution  equal  to  its  activity;  then  a  loss  is  added  each  time  the 
variable  is  tested.  This  relationship  is  formalized  below. 

Theorem  5.2.  The  expected  testing  cost,  E(T) ,  of  a  decision 
tree,  T,  for  the  function  f  can  be  expressed  as 


E(T)  -  I  a^(xp  -  I  p(fj^)  •  (x^)  , 

i«l  k  k 


where  the  second  sum  is  taken  over  all  Internal  nodes  and 


and  f, 


\  “““  ‘k 

refer  to  the  variable  and  the  subfunction  associated  with  the  k-th  node. 


Proof :  The  proof  is  by  induction  on  n,  the  number  of  variables. 
For  n  «  1,  the  basis  is  easily  verified:  the  variable  space  is  Just 


an  m-tuple  and  there  are  only  two  possible  tree  structures.  Assume 

that  the  theorem  holds  for  all  functions  of  up  to  and  including  n  -  1 

variables  and  choose  x.  to  be  the  root  of  T.  This  determines  m. 

1  1 

sub functions ,  each  of  n  -  1  variables,  so  that  the  inductive  hypothesis 


applies  and,  for  each  subfunction  f,  j  =  0,  ...»  m.  -  1,  witt 

V  ^  1 

corresponding  tree  T^,  the  expected  testing  cost  can  be  written  as 


where  the  second  sum  is  taken  over  all  internal  nodes,  s,  of  .  But, 


and,  upon  substitution,  this  becomes 


mf-l  j.  ^ 

E(T)  -  c  -  i  (X  )  +  I  p(f  )  .  I  a  (X  ) 

j-0  L  ^  i  ^  '  s=l  ‘^(x.-j)  ®  J 

^  J  J  i 


+  I  P(fj^)  •  (Xj^)  , 

k  k 


where  the  last  sum  is  taken  over  all  internal  nodes,  k,  of  T.  However, 
the  following  relations  obviously  hold: 
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I  a,(x  ) 


m.-l 

1 


a^(x^)  + 


j=0 


■  ii 


s^i 


(x^=j) 


(X  ) 


and 


c. 

X 


af(x.) 


Substitution  of  those  two  equalities  in  the  expression  for  E(T)  yields 
the  conclusion.  [[[] 

A  simplified  fom  of  this  theorem  was  proved  in  (Breitbart  75b) ,  using 
Chow  parameters,  for  completely  specified  monotone  Boolean  functions 
of  uniformly  distributed  variables  with  unity  costs. 

Corollary  5.1.  The  expected  testing  cost  of  any  decision  tree,  T, 
for  the  function  f  having  x^  as  root  test  is  bounded  by 


I  c,  <  E(T)  <  A,<x  )  +  I  a-(x.)  .  Q 

j=l  ^  ^  j-l  '  J 

This  corollary,  in  simplified  form,  was  proved  in  (Relnwald  66)  and  is 
implicit  in  (Breitbart  75b) .  Both  references  use  it  as  the  basis  for  a 
branch-and-bound  search  algorithm  to  find  a  tree  with  minimal  expected 
testing  cost. 

Those  results  stress  the  Importance  of  the  sum  of  the  activities 
of  the  variables  as  an  implementation-independent  measure  of  the  cost 
Incurred  In  determining  the  values  of  a  function.  This  motivates  the 
following  definition. 
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Definition  5.2.  The  intrinsic  cost.  1(f),  of  the  function  f  is 
the  quantity: 

1(f)  =  I  a,(x.)  .  n 

i=l 


Example  5.2.  Consider  again  the  function  of  Example  5.1.  Its 
intrinsic  cost  is 

1(f)  =  4.50  +  1.05  +  0.80  =  6.35  . 

From  Corollary  5.1,  the  lower  bound  on  the  expected  testing  cost  of  any 
tree  can  be  computed  for  each  choice  of  root: 

Ib(x^)  =  6.35  +  0.50  *  6.85  ; 

lb(x2)  =  6.35  +  1.95  =  8.30  ; 

lb(x2)  =  6.35  +  1.20  =  7.55  . 

Theorem  5.2  is  illustrated  by  considering  the  two  trees  of  Figure  2.5 
(page  31),  which  are  reproduced  in  Figure  5.1  with  the  loss  and 
probability  of  each  internal  node.  The  expected  testing  cost  of  tree  (a) 
is  Chen 

E(3)  -  6.35  +  1.00  •  0.50  +  0.15  •  0.00  +  0.40  •  1.50 

+  0.45  •  3.00  +  0.15  •  0.00  +  0.25  •  0.00  -  6.35  +  0.50 


+  0.60  +  1.35  -  8.80 
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and  that  of  tree  (b)  is 


=  6.35  +  1.00  •  1.20  +  0.30  •  1.00  +  0.70  •  0.00 

+  0.20  •  0.00  +  0.15  •  0.00  =  6.35  +  1,20  +  0.30  =  7.85  , 


which  are  the  values  found  in  Example  2.10. 


□ 


5.3.  Extension  to  Recursive  Relations 

In  this  section,  the  concepts  of  decision  tree  and  activity  are 
extended  to  relations,  recursive  relations,  and  hierarchies  of  relations. 
This  allows  the  modelling  of  systems  with  simple  feedback,  loops,  and 
systems  composed  of  several  subsystems.  In  the  case  of  decision  tables, 
this  makes  it  possible  to  consider  ambiguous  tables,  recursive  tables, 
and  tables  incorporating  calls  to  subtables  in  place  of  actions  (each  of 
which  is  beyond  the  reach  of  published  analyses) . 

The  extension  to  relations  on  finite  sets  is  of  particular  interest 
in  the  case  of  interdependent  functions  which  must  be  represented  by  a 
single  tree  [as  in  (Cerny  79)]  and  in  the  case  of  ambiguous  decision  tables 
tables  using  the  interpretation  of  (King  73). 

A  relation,  R,  from  the  set  of  input  vectors  to  the  set  of  output 
values,  might  specify  no  more  than  one  output  for  each  input  combination, 
in  which  case  it  is  a  (partial)  function;  it  may,  however,  specify  more 
than  one  output,  in  which  case  it  can  be  arbitrarily  decided  to  specify 
any  particular  output  or  subset  of  outputs.  For  consistency  of  notation, 
it  will  be  assumed  chat  an  unspecified  entry  is  in  fact  related  to  the 
whole  output  set,  so  chat  any  subset  of  output  values  may  be  specified 
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for  such  an  input  combination.  Such  a  relation  can  obviously  be 
represented  by  decision  trees  and  diagrams.  Testing  costs  are  defined 
as  for  functions,  and  so  are  probability  distributions.  (Thus,  the 
distribution  depends  only  on  the  input  values  and  is  unaffected  by  the 
choice  of  one  or  another  output  subset.)  The  concepts  of  activity  and 
loss  are  then  generalized  in  the  obvious  way  by  modifying  the  definition 
of  p!^(x.),  modification  which  results  from  the  fact  that  two  output 
subsets  specified  by  the  relation  are  considered  different  if  their 
intersection  is  empty.  It  is  readily  verified  that,  under  these 
assumptions,  all  results  previously  stated  for  partial  functions  remain 
valid  for  relations. 

2 

Examp le  5.3.  Let  R  be  the  relation  from  the  input  set  {0,  1,  2} 

X  {0,  1}  to  the  output  set  Q  =  {a,  b,  c,  d},  where  all  three  variables 
have  unity  costs  and  the  relation  and  the  probability  distribution  are 
given  by: 

R:  (0,  0,  1)  --{a}  (1,  0,  0)  ■*  { b ,  d}  (2,  0,  0)  {  d} 

(0,  1.  0)  -►{a,  b,  c}  (1,  0,  1)  --{b}  (2,  0,  1)  -{b} 

(0.  1,  1)  ^{a}  (1,  1,  0)  -^{b,  c,  d]  (2,  1,  0)  ^{c} 

(1,  1,  1)  ^  {  a)  (2,  1,  1)  -  {c} 

p:  (0,  0,  0)  -*■  0.05  (1,  0,  0)  0.10  (2,  0,  0)  0.00 

(0,  0,  1)  -r  0.10  (1,  0,  1)  -r  0.10  (2,  0,  1)  ^  0.05 

(0,  1,  0)  -r  0.10  (1.  1,  0)  0.10  (2,  1,  0)  ^  0.05 

(0,  1,  1)  -*■  0.20  (1,  1,  1)  0.05  (2,  1,  1)  0.10 


Since  all  variables  ha^^  unity  costs,  ’  ®°  that: 


a„(x  )  -  p(x,  »  0,  X-  -  1)  +  p(x,  -  1,  X,  =  1) 


(0.10  +  0.10  +  0.05)  +  (0.20  +  0.05  +  0.10)  -  0.60  ; 


similarly,  aj^(x2)  •  0.35  and  a^(.x^)  •  0.20.  Hence,  the  intrinsic  cost  is 


I(R)  =  0.60  +  0.35  +  0.20  •=  1.15  . 


Choosing  x^  as  the  root  test  for  a  decision  tree  results  in  a  lower 
bound  on  the  expected  testing  cost  of  1.15  +  (1-0.6)  »  1.55.  A  possible 
decision  tree,  T,  rooted  in  is  shown  in  Figure  5.2  together  with  the 
losses  of  its  nodes.  Its  expected  testing  cost  is 


E(T)  *  1.15  +  0.35*4/7  -  1.75  , 


and  it  is  in  fact  one  of  the  optimal  trees  for  R.  u 

As  a  further  extension  to  the  foregoing  concepts,  the  case  of 
recursive  functions  or  relations  is  now  considered.  A  recursive 
relation  is  a  relation  that,  for  certain  input  vectors,  does  not  specify 
output  values,  but  calls  for  a  new  evaluation  of  itself.  In  the 
following  discussion,  it  is  assumed  that  an  unspecified  entry  can  only 
be  replaced  by  a  subset  of  output  values  (not  by  a  recursive  call) .  This 
allows  the  computation  of  the  probability,  e,  that  an  evaluation  will  be 
made  without  recursive  calls;  if  e  is  one,  then  the  relation  is  not 
recursive,  while  if  e  is  zero,  Chen  the  relation  will  never  yield  a 
value,  but  will  keep  issuing  recursive  calls  ad  infinitum. 
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Figure  5.2.  The  decision  tree  of  Example  5.3  with  the  losses  of 
its  nodes. 

A  recursive  relation  can  be  represented  by  an  infinite  decision 
tree,  where  each  recursive  call  leads  to  the  root  of  a  new  component 
tree;  it  will  be  assumed  that  all  component  trees  have  the  same 
structure.  Under  this  assumption,  it  is  also  possible  to  represent 
a  recursive  relation  by  a  diagram  with  cycles,  each  cycle  leading  back 
to  the  root  of  the  diagram. 

A  first  question  about  such  representations  concerns  an  upper  bound 
on  their  testing  cost.  Such  a  bound  was  set  by  Corollary  5.1  for 
nonrecurslve  relations  as  the  sum  of  the  testing  costs  of  the  variables, 
but  can  evidently  be  passed  by  recursive  relations.  The  following 
proposition  provides  the  answer. 

Proposition  5.1.  Let  R  be  a  recursive  relation  on  n  variables, 

Xj^,  ...,  x^,  with  costs  Cj^,  ...,  c^,  and  let  e  be  the  probability  that 
no  recursive  call  will  be  needed;  then  the  expected  testing  cost  of 
any  representation  of  R  is  no  larger  than 
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n 

(1/e)  •  ^  c  . 

i-1  ^ 

Proof:  The  probability  of  a  recursive  call  occurring  in  any 
evaluation  is  (1  -  e) ;  an  evaluation  results  in,  at  worst,  the  test 
of  all  variables,  for  a  cost  of 


thus  the  total  cost  is  no  larger  than 

I  (1  -  e)*^  •  I  c,  -  (1/e)  ♦  [  c,  .  LJ 

k-0  i-1  ^  i-1 

The  actual  cost  of  a  tree  representation  can  be  computed  by  solving 
a  simple  first  degree  equation;  the  probability  that  the  relation  will 
take  on  a  specific  value  can  be  obtained  from  a  similar  equation,  with 
the  provision  that  entries  for  which  several  values  are  specified  are 
set  to  the  specific  value  under  consideration  wherever  possible. 

The  activity  is  generalized  to  recursive  relations  by  re  g  its 

computation  for  relations  on  one  variable  as  follows.  If  the  relat.  n 
is  not  recursive,  the  activity  is  that  defined  previously,  otherwise,  it 
is  equal  to  the  product  of  Che  variable's  testing  cost  times  the 
probability  that  the  relation  will  take  on  more  than  one  value.  This 
quantity  will  be  called  Che  tree  activity;  Che  corresponding  loss,  the 
tree  loss,  is  equal  to  the  testing  cost  of  Che  variable  minus  Its  tree 
activity.  The  same  quantities  multiplied  by  (1/e)  will  be  referred  to 
as  the  diagram  activity  and  diagram  loss . 


L 
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Theorem  5.3.  Let  R  be  a  recursive  relation  and  let  a  decision 
procedure  for  F  be  represented  by  a  diagram,  D,  and  by  an  infinite 
tree,  T.  The  expected  testing  cost  of  the  procedure,  E(D)  =  E(T),  is 
equal  to  (a)  the  sum  of  the  diagram  activities  and  that,  taken  over  all 
internal  nodes  of  the  diagram,  of  the  diagram  losses,  or  (b)  the  sum, 
taken  over  the  infinite  tree,  of  the  tree  activities  and  of  the  tree 
losses . 

Proof:  The  proof  relies  on  that  of  Theorem  5.2  and  on  simple 

2 

considerations  on  the  series  1,  (1  -  e) ,  (1  -  e)  ,  and  its  sum, 

1/e.  If,  in  the  diagram,  D,  all  recursive  calls  are  replaced  by  leaves, 
the  cost  of  the  resulting  tree  is  the  sum  of  the  tree  activities  and 
that,  taken  over  all  internal  nodes  of  the  tree,  of  the  tree  losses. 
Introducing  recursion  (either  as  a  diagram  or  as  an  infinite  tree) 
results  in  a  series  of  invocations,  the  probabilities  of  which  are 
described  by  the  series  (1  -  e)^.  I  I 

Corollary  5.1  is  similarly  extended. 

Finally,  all  of  the  above  results  can  easily  be  extended  to 
hierarchies  of  (recursive)  relations  by  analyzing  a  hierarchy  component- 
by  component  and  putting  the  results  together  using  the  probability  of 
each  component.  It  is  noted,  however,  that  the  results  are  not  directly 
extendable  to  indirect  recursion  (that  is,  through  a  chain  of  relations) 
unless  all  recursive  calls  are  to  the  same  relation  at  the  top  of  the 
hierarchy.  This  latter  case  is  now  demonstrated  along  with  the  other 
extensions  in  a  practical  example. 
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Examp le  5.4.  Consider  a  situation  in  which  a  monitoring  program 
must  periodically  evaluate  several  system  variables.  If  the  sampled 
values  point  to  a  satisfactory  status,  the  program  waits  for  a  specific 
period  of  time  and  examines  the  variables  again.  Otherwise,  either  a 
malfunction  is  detected  and  the  program  takes  some  action  and  stops,  or 
further  analysis  is  required  and  additional  variables  are  examined  to 
determine  whether  the  program  should  resume  its  normal  cycle  or  take 
some  action  and  stop.  The  first  part  of  the  examination  (the  normal 
cycle)  is  described  by  the  relation  Rl,  which  includes  calls  both  to 
itself  and  to  the  second  relation,  R2  (the  exception  cycle),  which 
includes  calls  to  Rl  (thereby  producing  indirect  recursion) . 

In  this  simple  example,  Rl  is  a  relation  between  {O,  1,  2} 

2 

X  {0,  1}  and  the  set  of  actions,  £2  ■  {a,  b},  and  R2  is  a  relation 
between  {0,  1}  and  £2,  as  specified  below,  together  with  the  probability 
distributions,  p^^  and  p^.  The  variables  will  be  denoted  x^^,  x^,  and  x^ 
for  Rl,  and  y^^,  y^,  and  y^  for  R2;  their  testing  costs  are  listed  below. 


Rl:  (0, 

0, 

0) 

{Rl} 

(1. 

0, 

0) 

{Rl} 

(2, 

0, 

0) 

{b} 

(0, 

0, 

1) 

{Rl} 

(1. 

0, 

1) 

-^  {  a} 

(2, 

0, 

1) 

{R2} 

(0, 

1. 

0) 

-»■  {Rl} 

(1. 

1, 

0) 

-»■  {Rl} 

(2. 

1. 

1) 

-*■  {R2} 

(0, 

1, 

1) 

{Rl} 

(1, 

1, 

1) 

{R2} 

Pj^:  (0, 

0, 

0) 

0.70 

(1. 

0, 

0) 

-  0.05 

(2, 

0, 

0) 

-*■  0.01 

(0, 

0. 

1) 

0.05 

(1, 

0, 

1) 

0.01 

(2. 

0. 

1) 

-*•  0.01 

(0, 

1, 

0) 

-  0.05 

<1, 

1. 

0) 

-*•  0.04 

(2. 

1. 

0) 

0.01 

(0, 

1, 

1) 

0.05 

(1. 

1, 

1) 

-  0.01 

(2, 

1. 

1) 

-  0.01 
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R2: 

(0,  0, 

0) 

^  {Rl} 

(1. 

0,  0) 

-  {Rl} 

(0,  0, 

1) 

-{b} 

(1. 

0,  1) 

-*■  (  a) 

(0,  1, 

0) 

^  {  a) 

(1. 

1.  0) 

-*•  {  a} 

(0,  1, 

1) 

-^{b} 

P2  = 

(0,  0, 

0) 

0.25 

(1. 

0,  0) 

-  0.25 

(0,  0, 

1) 

-*■  0.10 

(1. 

0,  1) 

0.10 

(0,  1, 

0) 

-*■  0.10 

(1. 

1.  0) 

0.10 

(0,  1, 

1) 

0.05 

(1. 

1.  1) 

0.05 

c(x^) 

1  -  4.5 

,  cCx^)  =  9, 

c(x 

3>  “  ^ 

.  c(yj^) 

50, 


45,  cCy^)  =  36, 


The  analysis  treats  R1  and  R2  separately  and  considers  a  structure 
from  which  all  recursive  calls  have  been  removed.  Once  this  structure 
has  been  analyzed  by  the  methods  developed  above,  the  results  are  put 
together  using  p(R2),  the  probability  that  R2  is  called  from  R1  in  a 
given  evaluation.  Recursion  is  then  taken  into  account  by  multiplying 
the  results  by  (1/e),  where  e  is  the  overall  probability  that  no 
recursion  will  be  needed. 

p(R2)  is  found  to  be  0.01  +  0.01  +  0.01  “  0.03;  similarly,  p(Rl), 
the  probability  that  R1  will  be  called  in  a  given  evaluation  of  R2  is 
equal  to  0.25  -)■  0.25  •>  0.50;  finally,  the  probability  that  no  recursive 
call  will  be  needed  is  the  probability  of  directly  reaching  a  value  in 
R1  or  through  a  single  call  to  R2: 


I 


't 

i 


95 


e  -  0.01  +  0.01  +  0.01  +  p(R2)  •  (0.10  +  0.10  +  0.10  +  0.05 
+  0.05  +  0.10)  =■  0.045  . 

The  maximum  probabilities  of  a  relation  yielding  a  value  can  then  be 
computed: 

p(Rl  -  a)  =  (O.Ol  +  0.01  +  p(R2)  •  (0.10  +  0.10  +  0.10 
+  0.05)  )  •  (1/e)  «■  0.67  ; 

p(Rl  =>  b)  =  (0.01  +  0.01  +  p(R2)  •  (0.10  +  0.05  +  0.05))  •  (1/e) 
-  0.57  ; 

p(R2  =  a)  =  0.10  +  0.10  +  0.10  +  0.05  +  p(Rl)  •  p(Rl  =  a)  =  0.68 
p(R2  =  b)  =  0.10  +  0.05  +  0.05  +  p(Rl)  •  p(Rl  -  b)  =  0.48  . 

The  tree  activities  are: 

aRi(Xj^)  -  c(Xj^)  •  (0.76  •  p(Rl  ^  h)  +  0.07  +  0.07  +  0.07  +  0.10) 
-  2.524  ; 

®R1^*2^  “  c(x2)  •  (0  +  0  +  0.02  •  p(R2  b)  +  0 
+  0.02  •  p(R2  a)  +  0)  -  0.148  ; 

■  c(x2)  •  (0  +  0.06  •  p(Rl  ^  a)  +  0.02  •  p(R2  ^  b) 

+  0  +  0.05  +  0)  -  0.716  . 
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Similarly,  “  10.15,  and  ”  14.78.  Thus, 

the  intrinsic  cost  of  the  relations,  I,  is  the  sum  of  the  tree  activities 
of  R1  and  of  the  tree  activities  of  R2  (weighted  by  p(R2)j  times  the 
recursion  factor,  1/e: 

I  -  (2.524  +  0.148  +  0.716  +  p(R2)  •  (10.00  +  10.15 
+  14.78))  •  (1/e)  =  98.571  . 

The  upper  bound  on  the  expected  testing  cost,  as  given  by  Proposition  5.1, 
is 

Cmax  =  (4.5  +  9  +  9  +  p(R2)  •  (50  +  45  +  36))  •  (1/e)  =  587.3  . 

Figure  5.3  shows  a  possible  decision  diagram,  D,  for  the  relations; 
the  lower  bound  for  the  cost  of  this  diagram  is  the  sum  of  the  intrinsic 
cost  and  of  the  diagram  loss  of  x^: 

lb(D)  -  98.571  +  (4.5  -  2.524)  •  (1/e)  =  142.486  . 

The  cost  of  the  diagram,  E(D),  can  be  computed  from  Theorem  5.3(a)  using 
Che  diagram  losses  and  probabilities  associated  with  the  nodes: 

E(D)  -  98.571  +  1.00  •  43.91  +  0.11  •  73.93  +  0.04  •  148.8 
+  0.02  •  137.7  +  0.02  •  97.7  +  0.02  •  200 
+  3  •  (0.01  •  471.5  +  0.007  •  677.7  +  0.003  •  370.370) 


197  ; 
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it  can  also  be  obtained  by  solving  the  linear  equation  expressing  the 
cost  in  terms  of  itself: 


E(D)  -  1.00  •  4.5  +  0.85  •  E(D)  +  0.11  •  9  +  0.04  •  9  +  0,04  •  9 

+  0.09  •  E(D)  +  0.02  •  9  +  0.02  •  9  +  0.02  •  9 

+  (0.01  +  0.01  +  0.01)  •  36  +  (0.007  +  0.007  +  0.007)  •  45 

+  (0.003  +  0.003  +  0.003)  •  50  +  (0.005  +  0.005 
+  0.005)  •  E(D)  , 

yielding 

(1.00  -  0.955)  •  E(D)  =  8.865  , 

so  that  E(D)  =  8,865/0.045  =■  197.  Q 


5.4.  Applications  to  Selection  and  Other  Problems 

As  mentioned  above,  a  simplified  version  of  activity  has  been  used 
as  the  basis  for  a  branch-and-bound  algorithm  to  find  the  tree  with 
minimal  expected  testing  cost  (Reinwald  66,  Breitbart  75b);  both 
references  pointed  out  that  the  same  algorithm  without  backtracking  was 
a  fast,  albeit  suboptlmal,  heuristic  procedure,  but  failed  to  provide 
an  analysis  of  its  performance. 

The  same  algorithm  is  easily  generalized  to  the  case  of  recursive 
relations;  it  consists  of  the  following  selection  rule:  when  developing 
the  decision  tree  for  Che  subf unction  f,  choose  as  Che  root  the  variable 
with  the  lowest  loss,  in  the  case  of  tie,  choose  the  variable  with 
Che  lowest  cost;  if  a  tie  subsists,  choose  any  variable.  The  use  of  the 
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loss  as  a  selection  criterion  is  consistent  with  the  requirements  set 
forth  in  (Ganapathy  73) .  As  a  selection  criterion,  the  loss  has  several 
advantages  over  similar  criteria  [such  as  found  in  (Pollack  65, 

Verhelst  72,  Halpem  74a,  Breitbart  75a)]:  it  is  simpler  to  compute, 
more  general,  and  optimal  in  several  cases. 

In  particular,  the  loss  criterion  will  always  lead  to  the  selection 
of  an  indispensable  variable  (since  such  variables  have  a  loss  of  zero 
by  definition) ;  such  a  choice  is  easily  seen  to  be  optimal  (Ganapathy  73) . 
Also,  the  lower  bound  of  Corollary  5.1  is  exactly  the  expected  testing 
cost  when  the  relation  is  on  two  variables  or  less;  it  follows  that  the 
loss  criterion  is  optimal  for  all  (recursive)  relations  on  two  or  less 
variab les . 

However,  like  any  other  method  relying  on  local  optimization,  the 
loss  criterion  can  lead  to  pessimal  solutions.  The  following  example 
illustrates  the  worst  case  behavior  of  the  loss  criterion  for  completely 
specified  Boolean  functions  with  unity  costs. 

Let  f  be  the  Boolean  function  given  by  the  formula 

n 

f  »  X  +  ©  X.  , 

^  i-2 

where  0  denotes  summation  modulo  2,  and  assume  the  following  probability 


distribution: 
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n 

each  input  vector  satisfying  x  •  @  x.  =  1  has  probabilitv 

i=2  ^ 

Y  •  £,  for  Y  <  1,  Y  =  1  : 

each  input  vector  satisfying  x^^  =  0  has  probability  e;  ; 

all  other  input  vectors  have  probability  ot  «=  2^  ”  -  (Y  +  2)  *£  . 

Then  the  activity  of  x^  is 

a^(x^)  =  2”  "  •  (Y  +  1)  •  £ 
and  that  of  any  other  variable,  x^,  is 

aj(x^)  =  2””^  •  £  , 

so  that  il^(x^)  <  2.^(Xj^).  The  two  subfunctions  resulting  from  the  choice 
of  some  x^,  i  1,  as  the  root  test  are  again  of  the  form  x^  +  ©  x^  ,  so 
that  the  trees  constructed  by  the  loss  criterion  test  variable  x^  last 
of  all  (on  half  the  branches,  the  others  ending  in  a  leaf  at  depth  n  -  1) 
and  have  cost: 

E(T)  =  n  -  1  +  2''"^*(y  +  l)*e  , 
while  the  optimal  trees  test  x^  first  and  have  cost: 

E(T  )  =  1  +  (n  -  l)«2"~^*e  . 
o 
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(The  case  n  »  4  is  illustrated  in  Figure  5.4.)  Thus,  if  €  <<  1,  (e.g., 
if  e  =  2  for  some  k  >  1) ,  the  asymptotic  ratio  of  costs  becomes 

C(T)/C(T  )  =  n  -  1  . 
o 

By  letting  every  point  satisfying 

n 

X  •  @  X  =  1 

i=2 

be  mapped  to  a  recursive  call,  the  worst  case  for  completely  specified 
recursive  Boolean  functions  is  obtained.  In  that  case,  the  best  diagram, 
D^,  has  a  cost  of  1 1  +  (n  -  1)*2''  -  2^  while  the  diagrams 

constructed  by  the  loss  criterion,  D,  have  a  cost  of  n/(l  -  2^ 
thus,  the  asymptotic  ratio,  E(D)/E(D^),  becomes  approximately  n  for 
small  e.  (That  both  recursive  and  nonrecursive  cases  yield  the  same 
worst  case,  (?(n),  is  due  to  the  fact  that  the  recursive  factor,  1/e, 
is  independent  of  tree  structure  and  gets  factored  out  in  the  ratio.) 

In  this  example,  however,  the  lower  bound  set  by  Corollary  3.1 
remains  arbitrarily  close  to  the  optimal  cost,  since: 

Ib^(x^)  =  1  +  (n  -  2)-2''’^-e  +  2""^-(y  +  l)*e,  i  1  , 

so  that  E(T^)  -  Ibj(x^)  ■  2”  ^  •  (1  -  y)  *  e  =  0.  Thus,  the  pessimal 
behavior  of  the  loss  criterion  could  be  detected  at  an  early  stage  in 
the  construction  of  the  tree  and  the  selections  revised.  This  is  not  to 
say  that  the  lower  bound  set  by  Corollary  5.1  always  remains  close  to 
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Figure  5.4.  The  two  trees,  with  their  leaf  probabilities,  for  the 
worst  case  performance  of  the  loss  criterion  when  n  -  4:  (a)  the  optimal 

tree,  T  ,  and  (b)  the  pessimal  tree,  I. 
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Che  optimal  expected  testing  cost:  it  is  easy  to  construct  an 
identification  problem  with  n  binary  variables  of  unity  cost  and  2'^  ^ 
equally  likely  objects  so  that  the  lower  bound  is  always  1  while  the 
optimal  cost  is  n  -  1  (it  is  enough  to  have  each  object  characterized  by 
a  combination  of  test  values,  the  sum  of  which  is  equal  to  1  modulo  2). 

It  must  be  noted,  however,  that  there  is  little  use  for  any 
selection  heuristics,  since  an  equally  efficient  optimal  algorithm  is 
available  [the  dynamic  programming  algorithm  of  (Bayes  73,  Schumacher  76)], 
which  can  easily  be  extended  to  apply  to  recursive  relations. 

Nevertheless,  a  selection  criterion  may  be  useful  in  that  it  indicates 
the  importance  of  individual  variables;  in  particular,  the  activity  has 
potential  applications  as  a  measure  of  complexity  (further  investigated 
in  the  next  section),  as  a  tool  for  system  testing  (which  application  is 
the  subject  of  Chapter  6) ,  and  as  a  gauge  of  the  power  of  discrimination 
of  a  variable  in  data  base  queries,  pattern  recognition,  and  other 
decision  problems. 

5.5.  The  Intrinsic  Cost  as  a  Measure  of  Complexity 

As  indicated  in  Chapter  1,  current  measures  of  the  complexity  of 
discrete  functions  are  not  entirely  satisfying,  mostly  because  of  their 
dependence  on  some  form  of  Implementation.  Thus,  software  science 
(Halstead  77)  studies  the  complexity  of  computer  programs  as  perceived 
by  humans,  combinational  complexity  (Savage  76,  Pippenger  77)  measures 
the  complexity  of  Boolean  functions  in  terms  of  the  number  of  gates  in  a 
circuit  or  the  number  of  terms  In  a  Boolean  formula,  and  even  concrete 
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computational  complexity  (Aho  74,  Garey  79)  uses  a  computer-like  model 
on  which  space  and  time  measures  are  developed.  Other  measures  [a 
recent  and  succinct  survey  of  which  can  be  found  in  (Rouse  79)]  are  even 
more  specialized. 

The  pitfalls  of  an  approach  keyed  to  a  particular  mode  of 
implementation  are  vividly  illustrated  by  a  comparison  between  the 
results  of  combinational  complexity  and  those  of  (Lee  59) ,  both  about 
Boolean  functions.  Combinational  complexity  is  concerned  with  the 
number  of  gates  (each  realizing  one  of  the  sixteen  Boolean  fimctions  of 
two  variables)  necessary  to  compute  a  function;  for  a  function  of  n 
variables,  it  is  known  that  ^2'^/n)  gates  are  needed  when  subfunctions 
can  be  regrouped,  and  (9(2''/log  n)  when  they  cannot  (as  in  representation 
by  unfactored  Boolean  formulae).  Lee's  results  show  that  a  function  of 
n  variables  can  be  represented  with  (?(2^/n)  nodes  by  a  decision  diagram; 
this  is  equal  to  the  necessary  number  of  gates,  but  only  (^(n)  operations 
are  needed  in  decision  diagrams  versus  <^2''/log  n)  in  a  combinational 
network.  This  clearly  indicates  that  combinational  complexity  is 
inadequate  as  an  implementation-independent  measure  of  complexity  and 
that,  moreover,  decision  trees  and  diagrams  capture  some  essential 
property  of  discrete  functions  that  is  not  reflected  in  some  other 
measures . 

Concrete  complexity  theory,  as  mentioned  in  Chapter  1,  has  made  use 
of  decision  trees  in  order  to  derive  lower  bounds  on  the  computational 
complexity  of  classes  of  problems.  However,  the  approach  taken 
(measuring  the  worst  case  testing  cost)  is  directly  dependent  on  the 


105 


decision  tree  model  and,  as  seen  in  Section  4.3.3,  the  results  are 
mostly  trivial  since  almost  all  Boolean  functions  are  exhaustive. 

In  conclusion,  then,  a  measure  is  desired  which  captures  some  of 
the  aspects  of  decision  trees  but  does  not  directly  depend  on  them.  Such 
is  the  intrinsic  cost  and  its  use  as  a  complexity  measure  for  discrete 
functions  is  proposed  here.  The  intrinsic  cost  is  implementation- 
independent;  it  is  a  lower  bound  on  the  cost  of  any  type  of  evaluation 
of  a  function.  Being  the  sum  of  activities,  it  also  lends  itself  to  an 
analysis  of  the  influence  of  individual  variables;  similarly,  since  the 
activity  is  compatible  with  the  process  of  decomposition,  the  intrinsic 
cost  of  subfunctions  is  readily  computed  from  that  of  the  given  function. 
Both  properties  are,  of  course.  Indispensable  for  a  system  design  and 
analysis  tool. 

Validating  an  applied  complexity  measure  is  a  large  undertaking, 
beyond  the  present  scope  of  this  investigation.  It  is  felt,  nevertheless, 
that  the  results  available  in  the  literature  and  presented  in  this  work 
justify  the  need  for  a  complexity  measure  such  as  the  intrinsic  cost. 
Further  justification  may  be  found  in  the  close  relationship  between 
activity  and  some  problems  of  testing,  which  relationship  is  discussed 
in  the  next  chapter. 


CHAPTER  6 


APPLICATION  TO  SYSTEM  TESTING 


6.1.  Introduction 

Decision  trees  have  been  used  by  numerous  researchers  for  purposes 
of  machine  diagnosis  (Hoehn  58,  Brule  60,  LaMacchia  62,  Chang  70, 

Koran  77);  in  most  cases,  however,  the  tree  is  the  specification  of  a 
testing  algorithm.  More  recently  (Akers  78a,  78b)  proposed  to  use 
Boolean  graphs  (that  he  called  decision  diagrams)  as  representations  of 
Boolean  functions  in  order  to  develop  testing  schemes.  He  also  Included 
sequential  systems  by  considering  only  their  next  state  function  and 
using  decomposition  to  the  point  where,  it  was  reasoned,  any  given  part 
of  the  analysis  encounters  few  next  state  variables. 

A  different  method  is  adopted  here.  In  keeping  with  modem  system 
architecture,  in  particular  the  use  of  large  scale  integration,  it  is 
assumed  that  a  system  has  many  more  internal  variables  than  can 
reasonably  be  controlled  (or  even  examined)  in  a  testing  experiment. 

The  emphasis  is  placed  on  signal  reliability  (Koren  79),  that  is,  a 
measure  of  the  probability  that  the  system's  output  is  correct,  rather 
than  the  conventional  functional  reliability,  a  more  pessimistic  measure 
based  on  the  probability  that  a  system  is  fault  free.  Combinational 
systems  are  first  examined  and  a  relation  is  established  between  the 
activity  of  a  variable  and  the  probability  of  detecting  a  faulty  output. 
The  results  are  then  extended  to  sequential  systems  using  a  steady  state 
model.  Finally,  further  applications  are  briefly  discussed. 
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6.2.  Combinational  Systems 

A  combinational  system  has  no  internal  memory  and  is  entirely 
described  by  its  output  relation  (according  to  the  premises  stated  in 
Chapter  1).  Figure  6.1  Illustrates  such  a  system,  F,  with  n  input 
variables  and  an  output  specified  by  a  relation,  R,  on  the  input 
variables. 


^  R(x^,  ...»  x^) 


Figure  6.1.  A  combinational  system. 


To  test  a  newly  produced  or  previously  used  version  of  F  by 
verifying  its  outputs  while  all  input  vectors  are  applied — a  method 
called  exhaustive  testing — is  a  task,  the  complexity  of  which  grows 
exponentially  with  the  number  of  variables;  accordingly,  exhaustive 
testing  quickly  becomes  impractical.  This  has  prompted  the  study  of 
alternate  approaches,  which  do  not  guarantee  the  detection  of  every  fault 
(only  exhaustive  testing  can) ,  but  minimize  the  probability  that  an  error 
will  go  undetected.  In  particular  (Losq  76,  78)  has  shown  that  random 
compact  testing  is  effective  for  large  scale  systems  realizing 
combinational  or  sequential  functions.  In  this  method,  all  inputs  are 
controlled;  a  sequence  of  random  input  vectors  is  applied  and  output 
statistics  (such  as  the  frequency  of  occurrence  of  each  value)  are 
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gathered.  The  test  statistics  are  then  compared  with  the  output 
statistics  of  a  perfect  unit  (the  "gold  unit")  and  the  unit  under  test 
is  accepted  as  being  in  working  order  if  its  output  statistics  fall 
within  an  "acceptance  window"  around  those  of  the  gold  unit.  (Losq  78) 
showed  that,  when  all  faults  are  assumed  to  be  permanent  (i.e.,  lines 
"stuck-at"  a  particular  value) ,  the  probability  of  acceptance  of  a 
faulty  unit  approaches  zero  as  the  test  length  increases  and  is  very 
small  already  for  tests  that  are  much  shorter  than  exhaustive  sequences. 

In  large  systems  in  operation,  it  is  not  always  desirable  (or  even 
feasible)  for  a  test  generator  to  assume  control  of  all  input  variables. 
Several  variables  may  be  altogether  unavailable  (for  instance,  the 
secondary  variables  of  a  sequential  circuit)  while  the  value  of  others 
cannot  be  modified  during  operation.  In  such  a  case,  one  is  reduced  to 
controlling  a  subset  of  the  Inputs  and  allow  the  others  to  vary.  A  test 
consists  then  of  an  exhaustive  application  of  all  the  combinations  of 
controlled  Inputs  and  a  verification  of  the  output  for  each  input  vector. 

Permanent  faults  are  assumed;  since  the  system's  implementation  is 
unknown,  only  input-output  relationships  can  be  observed  and  thus  any 
test  will  measure  signal  reliability.  Specifically,  a  fault  is  assumed 
to  cause  the  system  output  to  be  stuck  at  a  given  value  at  a  time  when 
a  gold  unit  would  produce  a  different  result.  Let  the  system,  F,  have 
Che  output  set,  (7,  and,  in  first  approximation  of  Che  fault  model,  let 
A^,  1  1,  ...,  |(7|,  be  Che  event  that  a  failure  causes  the  system's 

output  Co  be  stuck  at  Che  i-th  output  value.  Since  chose  events  are 


J 


109 


Now,  let  the  set  of  controlled  variables  be  composed  of  the  single 
variable,  x^;  the  other  variables  are  allowed  to  vary  according  to  the 
known  probability  distribution  over  input  variable  values.  [Essentially, 
the  remaining  inputs  are  assumed  to  arise  from  a  zero-th  order  Markov 
process  (Parzen  62) ,  In  which  each  set  of  values  is  selected  according 
to  fixed  probabilities  independently  of  preceding  values.]  The  test 
sequence  accordingly  consists  of  all  m^  values  that  x^  can  assume.  An 
incorrect  output  will  be  observed  during  the  test  exactly  if  either 

(a)  a  fault  is  present  and  the  relation  is  sensitized  to  x^,  or 

(b)  the  relation  is  not  sensitized  to  x^  but  is  stuck  at  a  value  other 
than  any  that  could  correctly  result  from  the  remaining  (n  -  1) 
variables'  values. 

These  are  disjoint  events,  the  first  occurring  with  probability 
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P  “  P^(x-)  •  P 


and  the  second  with  probability 


P,  -  I^p(Cj)  •  P^A,) 


where  the  union  is  taken  over  all  events,  A^,  such  that  R(Xj^)nR(Xj)  =  0 
and  the  sim  is  taken  over  all  combinations,  ,  of  the  remaining  n  -  1 
variables  such  that  R  is  not  sensitized  to  x^.  If  R  is  a  completely 
specified  function  and  the  failure  events  are  equally  likely,  those 
probabilities  become: 


Pa  =  •  5  p 


and 


^b  “  ^  ’ 

so  that  the  probability  of  observing  an  incorrect  output  by  applying  all 
possible  values  of  variable  x^  is 

Pa  +  Pb  “  Pr^*!^  •  jnl  •  6  +  P^(Xj^)  •  (|fi|  -  1)  *6 

“1^1  •  <5  •  (PR(Xi)  +  Pr^*!^)  ■  ^  *  Pr^*!^ 

=  |n|  •  6  -  6  •  P^(x^)  . 


Ill 


Since  p^(x.)  *  a  (x.)  when  costs  are  unity,  it  follows  that  testing 
that  variable  which  has  the  highest  activity  (i.e.,  the  lowest  loss) 
maximizes  the  probability  of  detecting  an  incorrect  output  value. 

Theorem  6.1.  Under  the  above  assumptions,  the  probability  of 
detecting  an  incorrect  output  value,  p^,  obeys  the  following  inequalities: 


Pj(using  the  highest  activity  variable)  ^ 
p^(using  randomly  selected  variable)  ^ 
p^ (using  lowest  activity  variable)  . 


□ 


The  above  results  extend  in  a  natural  way  to  controlling  subsets  of 
k  out  of  the  n  variables. 


6.3.  Extension  to  Sequential  Systems 

A  sequential  system  incorporates  memory.  The  standard  model  of  a 

discrete  parameter  system  with  memory  distinguishes  the  memory  unit  and 

the  combinational  subsystem,  as  Illustrated  in  Figure  6.2.  The  system's 

output  is  a  relation  on  the  primary  (external)  variables,  x, ,  ...,  x  , 

X  n 

and  the  secondary  (internal)  variables,  ...,  y^.  [When  the  system 
is  Boolean,  this  model  is  known  as  a  Mealy  machine  (Friedman  75) .]  It 
is  often  the  case  in  practice  that  the  memory  unit  values,  y^,  ...,  y^, 
are  not  directly  controllable  or  even  observable.  For  instance,  the 
limited  number  of  pins  available  on  packages  for  large  scale  Integrated 
circuits  does  not  usually  allow  the  allocation  of  any  connections  to  the 
secondary  input  variables.  It  is  therefore  assumed  that  only  the  primary 
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Figure  6.2.  The  conventional  model  of  a  sequential  system. 


variables  can  be  controlled,  even  for  purposes  of  testing;  further,  it  is 
assumed  that  a  fixed  probability  distribution  over  the  vectors  of  values 
of  the  primary  inputs  is  known,  which  represents  the  normal  operating 
conditions  of  the  system. 

Since  the  system  is  discrete  and  finite,  it  can  assume  only  a  finite 
number  of  states,  one  for  each  combination  of  values  of  the  secondary 
variables.  The  probability,  p^^,  of  transition  from  state  i  to  state  j 
can  then  be  computed  from  the  system's  known  relations  and  the 
probability  distribution  of  the  primary  variables.  Thus,  the  system 
can  be  considered  as  a  Markov  chain  (Farzen  62,  Booth  67),  which  is 
irreducible  (since  every  state  can  be  reached  from  every  other  state)  and 
recurrent  positive  (since  the  system  eventually  returns  to  each  state 
with  a  probability  of  one).  This  is  a  stochastic  process  about  which, 
in  particular,  the  following  results  are  known. 
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(a)  There  is  convergence  in  Cesaro  mean  to  a  vector  of  values  that 
can  be  interpreted  as  the  probability  distribution  over  the  states  in 
the  long  run;  that  is,  the  i-th  element,  tr^,  in  that  vector  is  the 
fraction  of  time  that,  in  the  limit,  the  system  spends  in  state  i.  The 
elements  of  the  vector  can  be  computed  by  a  set  of  linear  equations  of 
the  form 


(b)  Let  N^(n)  be  the  number  of  times  that  the  Markov  chain  is  in 

state  i  during  its  visits  to  the  first  n  states;  then  this  occupation 

time  of  i  is  asymptotically  gaussian  with  expectation  n  •  and  variance 
2  3  2 

n  •  0 . .  *  fT.  ,  where  a.,  is  the  variance  of  the  random  variable 

ii  1  ii 

describing  the  recurrence  time  of  state  i. 

The  first  property  implies  that,  in  the  long  run,  a  fixed 
probability  distribution  on  the  primary  input  values  induces  a  fixed 
distribution  on  the  secondary  variables'  values  (the  states).  This 
allows  the  computation  of  the  activities  of  the  primary  variables;  long 
run  testing  by  controlling  only  the  primary  variables  (and  not  letting 
the  test  sequences  alter  the  predetermined  probability  distribution)  can 
then  be  carried  out  In  a  way  similar  to  the  testing  of  combinational 
systems  described  in  the  previous  section,  with  the  same  results. 


6.4.  Discussion 

The  results  presented  in  the  previous  sections  are,  of  course,  only 
preliminary.  In  particular,  the  relationship  between  activity  and  the 
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probability  of  error  detection  must  be  worked  out  in  the  more  general 
case  of  nonuniform  failure  probabilities,  and  the  fault  model  itself 
must  be  validated. 

As  a  first  step  toward  the  latter  goal,  it  must  be  noted  that  the 
fault  model  adopted  in  this  chapter,  which  assumes  that  only  inputs  and 
outputs  (i.e.,  external  signals)  are  observable,  is  better  suited  to 
modern  methods  of  system  implementation,  especially  large  scale 
integration,  than  the  conventional  model  of  functional  reliability. 

Given  a  system  implemented  by  a  single  very  large  scale  integrated  (VLSI) 
circuit,  the  engineer  is  concerned  about  its  proper  functioning,  that  is, 
a  correct  input-output  behavior;  if  an  error  is  detected,  the  whole 
package  is  replaced:  the  nature  of  the  problem  inside  the  package  is  of 
no  concern. 

When  testing  a  combinational  circuit  "on  the  bench"  (that  is,  not 
in  operation),  all  of  the  inputs  are  usually  controllable,  so  that 
testing  only  part  of  the  variables  may  be  an  unnecessary  restriction; 
even  if  exhaustive  testing  is  not  permissible,  random  compact  testing 
can  be  used.  While  the  latter  is  certainly  indicated,  there  may  yet  be 
reasons  for  which  an  exhaustive  test  of  the  variables  with  highest 
activities  would  be  preferred.  In  particular,  such  a  test  thoroughly 
checks  out  Che  behavior  of  the  system  with  respect  to  the  most 
"significant"  variables;  as  such,  a  successful  test  indicates  that  the 
functionally  important  parts  of  the  system  are  in  working  order.  A 
measure  of  the  extent  of  these  parts  is  Chen  the  ratio  of  Che  sum  of  the 
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activities  of  the  exhaustively  tested  variables  to  the  intrinsic  cost  of 
the  system. 

This  property  has  potential  applications  to  the  quality  control  of 
VLSI  circuits,  where  the  yield  of  perfect  circuits  is  often  low.  One 
could  conceive  a  test  conducted  on  the  most  relevant  variables  in  order 
to  guarantee  that  at  least  the  important  parts  of  the  circuit  are  in 
working  order;  those  imperfect  circuits  that  passed  the  test  could  then 
be  released  for  noncritical  applications,  rather  than  altogether 
discarded.  (This  could  lower  the  price  of  VLSI  production;  for  further 
recycling  of  costly  products,  the  process  could  be  coupled  with  limited 
repair  capabilities  in  order  to  bring  more  circuits  within  the 
predetermined  percentage  of  the  intrinsic  cost  necessary  for  acceptance.) 

Yet  other  applications  are  foreseeable.  Clearly,  however,  a  good 
deal  of  prior  research  is  necessary. 


CHAPTER  7 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  use  o£  decision  trees  and  diagrams  as  models  of  discrete 
functions  has  been  Investigated.  A  general  framework,  has  been 
elaborated,  into  which  the  diverse  results  available  in  the  literature 
have  been  brought.  In  particular,  a  complete  analysis  of  the  complexity 
of  optimization  of  decision  trees  has  been  presented,  including  several 
new  results  on  the  worst  case  computational  complexity  of  Boolean 
functions.  After  a  discussion  of  the  various  measures  defined  on 
decision  trees  and  diagrams,  a  single  measure,  the  expected  testing  cost, 
was  selected  as  representative  of  the  complexity  of  decision  trees.  This 
measure  was  further  examined  in  order  to  develop  a  measure  of  complexity 
on  functions.  In  particular,  the  concept  of  the  activity  of  a  variable, 
a  measure  of  the  contribution  of  a  variable  to  the  testing  cost  of  a 
function,  was  introduced,  and  its  relation  to  decision  trees  was 
established.  These  concepts  were  shown  to  be  applicable  to  recursive 
relations  and  hierarchies  of  relations. 

After  a  brief  discussion  of  the  applications  of  activity  to  the 
construction  of  decision  trees,  its  use  as  a  complexity  measure  for 
discrete  functions  was  proposed  and  discussed.  Finally,  the  application 
of  activity  to  problems  of  testing  was  examined,  with  particular  emphasis 
on  the  testing  of  large  scale  Integrated  systems  in  operation.  Further 
research  is  needed  in  order  to  validate  the  proposed  measure  of 
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complexity  and  develop  some  specific  characterizations.  Although  the 
testing  applications  introduced  in  the  previous  chapter  are  of  a 
preliminary  nature,  they  clearly  indicate  the  potential  of  the  concept 
of  activity  for  solving  some  of  the  acute  problems  encountered  in 
testing  large  systems. 
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